Evaluating Interviews

24 minutes, 1 link

You’re reading an excerpt of The Holloway Guide to Technical Recruiting and Hiring, a book by Osman (Ozzie) Osman and over 45 other contributors. It is the most authoritative resource on growing software engineering teams effectively, written by and for hiring managers, recruiters, interviewers, and candidates. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, over 800 links and references, commentary and future updates, and a high-quality PDF download.

When interviewers know what to expect from interview questions—what makes an answer “poor,” “fair,” “good,” or “excellent”—they are less likely to let noise and bias slip into their evaluations. Rubrics are systems that make it easier for interviewers to provide and discuss feedback on candidates, because everyone will be working from the same set of expectations.

A rubric is a set of guidance, usually written, for evaluating candidate’s answers to interview questions. Included in this guidance may be examples of answers at different quality levels, or prompts to help the interviewer perform their assessment. Rubrics may also provide interviewers with a series of questions to use, typically of increasing difficulty.

In a structured interview, rubrics are essential to keep interviewers on track and ensure an effective and fair process. Even with a rubric, it can be tricky to specify exactly what a good or bad performance looks like—there’s always variation in each individual and room for human judgment or interpretation. You might think of rubrics as a starting point to help foster fair and productive interviews and evaluations.

There are many ways to write technical and nontechnical rubrics, and every company has their own method.

Building Rubrics

Teams design rubrics differently. Rubrics can be perfunctory, with just a list of questions to ask and simple pass/fail boxes to check, or they can be detailed, including briefs on why each question is being asked, descriptions of what different levels of success are for each question, and/or what would be expected of different candidate levels for each question. The more detailed the rubric, the more fair and systematic the process—but the greater the challenge of designing and maintaining it.

Rubrics do not remove the need for flexibility in any given interview. They help you to score a candidate’s progress through a question, and can even allow you to be more flexible by preparing you to pivot when something unexpected comes up.

story “All good questions have variable depth. Like, there’s a point halfway through the question where it’s clear the person’s not going to get through it. So you might have a 20-minute version of a 60-minute question. That can go the other way, ‘This person is going through this so fast, let’s make it harder.’ It’s like a rip cord, where 15 minutes in I know whether I can pivot to the shorter version or longer, or end it, or add the next layer of the onion. It’s about preparing to offer extensible difficulty, variable difficulty.” —Scott Woody, former Director of Engineering, Dropbox

story “One way to increase the reach of a question bank is to structure the rubric to call out what answers you would expect at what candidate level. This can help minimize bias (everyone knows what ‘senior’ means for this), and, where it makes sense, reduces the need to have different question sets for different levels.” —Ryn Daniels, Senior Software Engineer, HashiCorp

A Sample Technical Question Rubric

Question: Write a program that prints out the elements of a binary tree in order.

What we are looking for in this question:

  • The candidate asks appropriate clarifying questions, such as what data type is stored in the tree and what the output format should be.

  • The candidate is able to independently write the initial version of the program without significant interviewer intervention or coaching.

  • There are either no bugs, or the candidate is able to find and fix all of their bugs in a proactive, independent fashion (that is, the interviewer does not have to point out that there is a bug or give them a test case).

  • The candidate uses all language features in an appropriate way—it’s OK if they make syntax errors or don’t know the names of library functions.

  • The candidate is able to describe the Big O notation performance of their program accurately, and they do not use unnecessary memory or do inefficient operations such as visiting a node multiple times.

  • An excellent performance: Requires hitting all of the above bullets, and will typically result in a “solid yes” for the candidate.

  • A good performance: Requires at least four of the five bullets—typically, someone can get a “good” rating as long as the issues are largely in making up-front assumptions rather than having significant bugs or logic errors. A good performance will typically result in a “weak yes” for the candidate. The simplicity of this question means that somewhat shaky performance can also result in a “weak no.”

  • A fair performance: The answer fails on multiple topics, such as having multiple bugs and also not being able to describe the Big O performance. A fair performance on a question this easy should result in a “solid no” for the candidate.

  • A poor performance: The candidate cannot complete the problem, even with significant hinting.

important Note that in many technical questions, the rubric will get more technically specific about exactly what kinds of answers are or are not OK for each level. This question happens to be a simple one, so it doesn’t demonstrate much detail.

Evaluating Coding Questions

When writing rubrics for coding questions, keep in mind that there is a great deal more to assessment than whether or not the candidate solved the problem. Interviewers might want to evaluate the following, for example: Was the code well written? Did the candidate reason through the problem well? Did they do a good job of evaluating their own code’s correctness? Were they able to answer follow-up questions?

However, some things are not appropriate to include in the evaluation of a coding interview.

caution Interviewers should ignore anything that is an artificial result of the environment. If the candidate is writing whiteboard code, this includes the candidate’s handwriting and whether the code was visually messy. Viewing variable naming and duplicated code leniently is also wise. If you have concerns, you can ask the candidate about the choices they made; it’s likely they were just avoiding rewriting code or writing long names.

It’s not appropriate to penalize a candidate harshly just because they have a bug! It’s very hard to write correct code, especially in the context of an interview. You will most likely want to see a strong thought process and an ability to translate ideas to code and model the flow of execution of a program, but mistakes will inevitably happen. You can expect that candidates should find bugs when given hints or a test case, though. If someone simply cannot execute their own code objectively, or if they cannot find a bug even when it’s been pointed out to them, that indicates a serious issue.

A Sample Nontechnical Question Rubric

Question: Tell me about a conflict with a co-worker and how you resolved it.

What we are looking for in this question:

  • Do they identify a real conflict?

  • Can they explain the co-worker’s perspective? (Bonus points if they had to do work to discover that perspective.)

  • Can they explain what the root cause of the disagreement was and what the “right” answer to the conflict should be from a third party’s perspective?

  • Did they resolve the conflict in a constructive, low-drama way? (Alternative, bad options: avoiding facing up to the conflict; escalating prematurely; playing politics to get what they want.)

  • Was the solution they reached actually a good solution to the problem from the perspective of a neutral observer?

When asking this question, if you do not hear some of the elements the rubric is looking for, you should ask follow-up questions to touch on those areas. For example, ask how their co-worker saw the situation if they don’t explain it directly on their own.

  • An excellent performance: This requires covering all of these points, or a demonstration that the candidates could have touched on them, without prompting, or with only light/moderate prompting (for example, a raised eyebrow, a questioning look, or a subtle follow-up question designed to nudge the candidate and see if the answer was top of mind).

  • A good performance: Touches on all of the elements of the answers, but might have required heavy prompting in one area (for example, directly asking what the co-worker’s opinion/viewpoint was, or having to dig deep yourself to understand the root cause).

  • A fair performance: Similar to a “good” performance, but with more prompting needed and generally lower-quality answers, giving the interviewer lower confidence in the response (for example, if the candidate cannot give specifics or they remain vague even after prompting).

  • A poor performance: The candidate plays politics, resolves the question in their favor without accounting for the wider interest, or simply can’t give an example of having dealt with conflict.

important People have varying definitions of what a “conflict” is, so you may consider adjusting the phrasing of this question if you feel you’re not getting the right signal. Ryn Daniels recommends asking the candidate, “Tell me about a time you disagreed with a colleague on a decision” or “Tell me about a time when you changed your mind.” Each of these questions can reveal the same things: how the candidate interacts with others, and whether and to what degree the candidate is self-reflective and flexible in the face of new knowledge.

story On nontechnical interviews, the rubrics tend to be less directly referenced because they’re likely to cover individual questions, of which there are many (whereas there tends to be one big technical question), so the write-ups might tend to anchor more on a meta-rubric of the general types of things you’re looking for and highlight specific questions where the interviewer did poorly. —Alex Allain, Engineering Director, Dropbox

Collecting Interviewer Feedback

Interviewer Write-ups

Ideally, interviewers will record their feedback on the candidate as soon as possible. The fresher the interview is in the interviewer’s mind, the more complete and objective it is likely to be. Additionally, since next steps rely on this information, waiting a while to record your feedback can slow down decision-making.

The write-up justifies the decision with concrete evidence based on the rubric, by identifying which parts of the rubric were or weren’t met. A sample write-up based on the technical question above might look like this, for a performance evaluation of “fair”:

The candidate struggled with this problem overall, earning no more than a “fair” on the rubric and a “no hire” on the interview. Mapping how they did to the rubric:

  1. The candidate asked appropriate clarifying questions (“What kind of data is stored in the tree?”, “Do you care about performance?”); they assumed the output would be printing to stdout—the interview was off to a good start.

  2. The candidate struggled to write a correct version of the program; their initial attempt only printed the left and right branches of the tree one level deep.

  3. The candidate struggled to identify their bugs when they walked through the program themself; for example, they failed to recognize that they didn’t handle the empty tree correctly and had to be prompted to handle multiple levels deep. With significant, repeated hinting (including giving a concrete example case) the candidate eventually did fix their issues.

  4. The candidate wrote reasonably idiomatic language.

  5. The candidate couldn’t properly describe the Big O of their solution (they claimed it was O(1) instead of O(n)).

dangerIt’s important that no interviewer be exposed to other interviewers’ feedback before they have submitted their own. Seeing other feedback can bias an interviewer, which can diminish the quality and fairness of decisions. This is especially true if someone more junior is exposed to the opinion or feedback of someone more senior. Check to see if your ATS (if you’re using one) can help hide other interviewer’s feedback as it gets submitted.*

The level of detail required in interview feedback can vary from company to company, depending on how that feedback will be used. For example, feedback used as notes to help an interviewer remember key points to discuss during a debrief session may be less detailed than feedback that will be shared with and used by an independent hiring committee.

That said, the more comprehensive the feedback, the less room for relying on memory that might fade (even over the course of a few hours or days) or “gut” instinct (that may be prone to bias). And the hiring team may want to revisit that feedback further in the future—for example, when deciding whether to reconsider a candidate or to analyze the hiring decision. Providing a structured form for interviewers to provide feedback makes this easier. Many tools can help enforce timely and structured submission of feedback.

Further Reading on Interview Feedback


Components of Great Feedback

This section was written by Kevin Morrill.

Comprehensive feedback will do the following:

  • Paint a narrative. An interviewer’s feedback ideally will clearly convey what they talked about with the candidate. What questions did the interviewer ask, and how did the candidate answer them? Are there code snippets that can be included in the feedback? This helps the team draw conclusions from the collective set of interviews. One of the best ways to make a narrative clear is to quote the candidate, instead of jumping straight into your own interpretation.

  • Tie positive and negative aspects to core competencies. Anything that went well or poorly in the interview will ultimately relate back to the competencies, skills, and traits you are interviewing for. At the very least, it’s helpful for the interviewer to connect observations outside core competencies to a predicted situation on the job.

  • Make a predictive statement about on-the-job performance. Why will a particular behavior the candidate demonstrated matter on the job?

  • Be written down. All the data collection in the world is useless unless you write it down and have it available for the debrief session and hiring manager or committee.

  • Convey a clear decision. Interviewers hopefully will walk away from the interview with a clear sense of whether the candidate is a fit. If they’re not convinced, that’s a default “no hire”—there is no such thing as a neutral opinion. The spectrum of “hires” extends from “strong no” to “strong yes.” For a “strong yes,” you will feel like you need to chase them to the parking lot and get them on board immediately, and you would be worried if they joined a competitor. Below that, but still not in the “no hire” range, you may feel like they will raise the average level of performance on the team. Some companies allow the interviewer to specify how confident they are of their rating.

  • Convey secondary signals. For instance, maybe the candidate performed well in a technical interview, but was somewhat rude. It’s useful to disentangle those two signals.

  • Identify gaps. Comparing across interviewers’ feedback, the team can also figure out where there are areas or open questions that need follow-up (either via another interview or through conversations with the hiring manager or recruiter).

  • Support an audit. if you ever make a hiring mistake and have to fire someone, one of the first postmortem activities is to evaluate the interview loop and figure out what happened. If you have great notes, you can learn and improve the process.

Written Feedback Example

Here’s an example of an interviewer’s effective written feedback:

  • Recommendation: HIRE

  • Pros: High intelligence (keen awareness of concretes, able to employ abstractions), results oriented, good communication, aptitude for organizing code effectively.

  • Cons: Questionable organization and time management skills.

  • Details:

    Matt worked out to be a very impressive candidate. I am convinced he would do very well with the kinds of problems we solve and immediately drive value to our customers.

    I started by talking with Matt about evolving our architecture to support data on locations coming from multiple sources and then making a verdict on what the actual location is based on all of that data (rather than our “first-in wins” model). He probed on the problem of when conflicts arise, doing the math to realize that even a 1% conflict rate is about 2,300 results to manually review. He assessed that this is too much human work and that an algorithmic approach is in order. I like that he carried through the math and actually thought about it at a concrete level. Weaker candidates just take the problem as given and plow forward assuming it must be a problem worth solving.

    He talked about technology choice by saying “Do you want to stay Postgres?” I clarified that we’re on MySQL and said he had the power to change technology but needed to have a good reason to do it. He talked through what that would mean and immediately gravitated to indexing needs as being key. For key/value-oriented storage, it would be harder to efficiently query on things like last_updated, as you’d only have one key to work with. At one point in his thought process he said aloud, “My intuition says…” and proceeded to explain his thinking. I took this as a very good sign, since Courtland brought up that he charged forward on a weak answer to interest rate impact, even though he probably didn’t know what he was talking about. This kind of verbal queue says to me that when he’s working in his comfort zone, he documents his assumptions and critically evaluates them.

    At one point I foolishly said that he should probably just ignore the overwritten data; I didn’t mean to trick him, but in retrospect this worked out to be a good example of challenging a manager when they’re wrong. He made a good case for how you could use changes in the data from the same source to tell you the accuracy of a location. This was also another good example of him thinking intelligently on the fly. Love that he could take something abstract, like whether to save data, and pull it down to concrete about a specific startup’s migration to make the decision.

    We then talked through what the guts of his black-box “Rules Engine” would be. I’ve actually been thinking a fair amount about this, having coached Bryan Chang on his dev design. He proceeded to lay out a design while thinking on his feet; it was far better than my own thought process that benefited from far more time and knowledge about our challenges. Right out of the gate with no prompting/leading he said, “If statements would be stupid” you want “rules and authority rank.” He came up with a system where we could pass around references-to-function that embodied the rules. Each function could return either a final answer or a weighted answer. I had never thought about using weighting, which is a pretty cool thought.

    To close out, I asked him the five-minute communication question. He seemed to get right away what things he should be preparing for. He said aloud, “Let me make sure I can encapsulate key points.” He asked if I knew anything about Shakespeare (for some reason many candidates that are worse communicators don’t take the liberty to ask, and this shows in their work as they tend to be very black box and harder to work with). He wrote an outline before he started. He then proceeded to give an impassioned explanation of Shakespeare’s use of verse. The big red flag is that he went over by at least 50% in time and didn’t seem phased by this despite all his careful preparation. In some cases I’ve seen this be a predictor of poor organization that carries through into delayed projects on the job.

Leaking Candidate Information Internally

danger Be careful discussing candidates on a public forum like Slack. Even if the audience is limited, if one person makes the wrong comment, Slack discussions can easily spiral in unhelpful, inappropriate, or unfair directions. Without care, interactions like this could even become the basis for a lawsuit.

By contrast, a senior employee can create a clearly organized, in-person debrief that avoids these challenges. However, remote teams may treat Slack discussions as the equivalent of in-person debriefs. If this is the case, you can train interviewers on appropriate conduct in this forum, and managers can make sure they steer the conversation if it becomes unhelpful.

While the interview process includes the whole interviewing team—and good decisions here require transparency—there is no reason to reveal information about candidate performance beyond the decision-making audience. It is important to make sure candidates do not accidentally see their own interview feedback or discussions about them, if they do eventually join the team.

Some parts of the hiring process are dictated by law. Laws differ depending on the size of your company and the jurisdiction in which you operate. For example, California prohibits companies from asking candidates about their salary history,* but most states still allow such questions.* Giving legal advice is outside the scope of this work, but you can read more about the types of things you can and cannot do in an interview process below.

  • age

  • You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!