Engineers can’t gauge their own interview performance. And that makes them harder to hire.

Note: This post is cross-posted from interviewing.io’s blog. interviewing.io is a company I founded that tries to make hiring suck less. I included it here because it seems like there’s a good amount of thematic overlap. And because there are some pretty graphs.

interviewing.io is an anonymous technical interviewing platform. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle. In the past few months, we’ve amassed over 600 technical interviews along with their associated data and metadata. Interview questions tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role at a top company, and interviewers typically come from a mix of larger companies like Google, Facebook, and Twitter, as well as engineering-focused startups like Asana, Mattermark, KeepSafe, and more.

Over the course of the next few posts, we’ll be sharing some { unexpected, horrifying, amusing, ultimately encouraging } things we’ve learned. In this blog’s heroic maiden voyage, we’ll be tackling people’s surprising inability to gauge their own interview performance and the very real implications this finding has for hiring.

First, a bit about setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews. If both people find each other competent and pleasant, they have the option to unmask. Overall, interviewees tend to do quite well on the platform, with just under half of interviews resulting in a “yes” from the interviewer.

If you’re curious, you can see what the feedback forms look like below. As you can see, in addition to one direct yes/no question, we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of those questions is about how well they think they did. In this post, we’ll be focusing on the technical score an interviewer gives an interviewee and the interviewee’s self-assessment (both are circled below). For context, a technical score of 3 or above seems to be the rough cut-off for hirability.

Feedback form for interviewers
Feedback form for interviewers

Feedback form for interviewees
Feedback form for interviewees

Perceived versus actual performance

Below, you can see the distribution of people’s actual technical performance (as rated by their interviewers) and the distribution of their perceived performance (how they rated themselves) for the same set of interviews.1

You might notice right away that there is a little bit of disparity, but things get interesting when you plot perceived vs. actual performance for each interview. Below, is a heatmap of the data where the darker areas represent higher interview concentration. For instance, the darkest square represents interviews where both perceived and actual performance was rated as a 3. You can hover over each square to see the exact interview count (denoted by “z”).

If you run a regression on this data2, you get an R-squared of only 0.24, and once you take away the worst interviews, it drops down even further to a 0.16. For context, R-squared is a measurement of how well you can fit empirical data to some mathematical model. It’s on a scale from 0 to 1 with 0 meaning that everything is noise and 1 meaning that everything fits perfectly. In other words, even though some small positive relationship between actual and perceived performance does exist, it is not a strong, predictable correspondence.

You can also see there’s a non-trivial amount of impostor syndrome going on in the graph above, which probably comes as no surprise to anyone who’s been an engineer.

Gayle Laakmann McDowell of Cracking the Coding Interview fame has written quite a bit about how bad people are at gauging their own interview performance, and it’s something that I had noticed anecdotally when I was doing recruiting, so it was nice to see some empirical data on that front. In her writing, Gayle mentions that it’s the job of a good interviewer to make you feel like you did OK even if you bombed. I was curious about whether that’s what was going on here, but when I ran the numbers, there wasn’t any relationship between how highly an interviewer was rated overall and how off their interviewees’ self-assessments were, in one direction or the other.

Ultimately, this isn’t a big data set, and we will continue to monitor the relationship between perceived and actual performance as we host more interviews, but we did find that this relationship emerged very early on and has continued to persist with more and more interviews — R-squared has never exceeded 0.26 to date.

Why this matters for hiring

Now here’s the actionable and kind of messed up part. As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship (p < 0.0008) between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you3. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

Lastly, a quick shout-out to Statwing and Plotly for making terrific data analysis and graphing tools respectively.

1There are only 254 interviews represented here because not all interviews in our data set had comprehensive, mutual feedback. Moreover, we realize that raw scores don’t tell the whole story and will be focusing on standardization of these scores and the resulting rat’s nest in our next post. That said, though interviewer strictness does vary, we gate interviewers pretty heavily based on their background and experience, so the overall bar is high and comparable to what you’d find at a good company in the wild.

2Here we are referring to linear regression, and though we tried fitting a number of different curves to the data, they all sucked.

3In our data, people were 3 times less likely to want to work with their interviewers when they thought they did poorly.

3 Responses to “Engineers can’t gauge their own interview performance. And that makes them harder to hire.”

  1. Noah Gibbs

    This matches my current beliefs, and is therefore a fantastic blog post. Otherwise, I wouldn’t have wanted to read it anyway 😉

    But seriously, it’s really good to have data to back this intuition up.

    Reply
  2. Guy Alster

    The data suggests a difference between the way people rate themselves and how their interviewers rate them. One conclusion is, as the post suggests, that people can’t really gauge their own interview performance. Another conclusion which is what I believe in (after sitting on both sides of the table) is that neither can the interviewer gauge the interviewee’s performance. All the interviews at tech companies follow a similar pattern. You are asked an algorithmic question or a system’s design question (based on your seniority) and then expected to come up with a solution that solves the problem with all the constraints presented by the interviewer. The solution is then written on a white board and expected to be bug free. So far so good. But here is the catch. This technique while seeming to be a good prediction for the ability of an engineer to solve problems, has a very high noise to signal ratio. Why ? Because at the end of the day, it’s a human being that calls the shots. We humans, are biased, unpredictive and prone to “human errors”. Hence, two people answering the same question identically, may end up getting different results from the interview with the same interviewer or with different interviewers. To really assess people objectively, the process requires removing the human factor away. Obviuosly this is impossible, as the company wants to make sure that the engineer they are hiring is not a “robot” and can actually interact with other humans. So I believe that the conclusion of this post is only half right.

    Reply

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>