Hire from your user base

Recruiting great engineers is hard. You might guess that the hardest part is finding great people, but with the advent of LinkedIn, GitHub, and a number of search aggregators, it’s actually not too bad. The really hard part is finding great people who are likely to be interested in what you’re doing. To that end, wouldn’t it be great if you could reach out specifically to engineers who are already familiar with your product? It turns out that with a list of user emails and a few scripts, you can.

Companies with brands that are household names are doing this already — when they do cold reachouts, those reachouts are actually warm — chances are, anyone they contact will have used or heard of their product at some point. However, for companies who aren’t yet household names, cold reachouts are not that effective. Personalization of messages can help, sure, but execution can be difficult, either because the person writing the messages doesn’t have enough technical depth to do this right or because there’s not enough info available on the internets about the recipient to say anything meaningful.

So, how do you find engineers who are already familiar with your offering? All you need is a list of your users’ emails. If you don’t have that list, you’re probably doing it wrong. Also, if you’re a B2B company, you probably shouldn’t be poaching engineers from people who give you money, but I’ll leave questions of case-specific ethics as an exercise for the reader.

Once you have your user list, you can filter it to find the engineers by checking each email against GitHub to see if there’s a corresponding account. There are 2 ways to do this. One is free but is also information-poor, and slow. The other will cost you a few hundred bucks but will provide a lot more info about people and will be orders of magnitude faster. Choose your weapon.

To get all the code mentioned in this post, go to my GitHub.

Free but slow

One way to see if someone is an engineer is to ping GitHub directly. This is something you’re probably already doing anyway when you source, but doing it manually is slow and intractable on a large user list. Instead, here’s a Python script that pings GitHub to see if there are GitHub accounts associated with your users’ emails.

import json
import urllib
import urllib2
import base64

def call_github(email):
  """Calls GitHub with an email
  Returns a user's GitHub URL, if one exists.
  Otherwise returns None """
  url = 'https://api.github.com/search/users'
  values = {'q' : '{0} in:email type:user repos:>0'.format(email) }

  # TODO insert your credentials here or read them from a config file or whatever
  # First param is username, 2nd param is passwd
  # Without credentials, you'll be limited to 5 requests per minute
  auth_info = '{0}:{1}'.format('','')

  basic = base64.b64encode(auth_info)
  headers = { 'Authorization' : 'Basic ' + basic }
  params = urllib.urlencode(values)
  req = urllib2.Request('{0}?{1}'.format(url,params), headers=headers)
  response = json.loads(urllib2.urlopen(req).read())

  if response and response['total_count'] == 1:
    return response['items'][0]['html_url'] 
  else:
    return None 

I chose to only consider people who have at least one repo. You can be more liberal and look at everyone, but empirically profiles with 0 repos have been pretty useless. You can also filter further by limiting your search to people who have some number of followers, code in a certain language, etc. The full list of params is here. Note that although location is listed as a filter and can be good (i.e. if you can’t support remote workers and if it’s not H-1B season), the search is not as useful as it could be because location is a self-reported, optional field, and users who don’t specify location will not be included in the results.

One of the big downsides of this approach is contending with GitHub’s rate limit: 20 requests/min with credentials or 5 requests/min without. To put things in perspective, if you have 500,000 email addresses to go through, at 20 requests/minute, you’re looking at about 18 days for something like this to run. With that in mind, if you’re dealing with a long list of emails, you could potentially use a number of GitHub credentials and load balance requests across them. At the very least, I’d suggest spinning up an EC2 instance and running your script in a separate screen.

Not free but verbose and fast

Though the GitHub approach isn’t bad, it has 2 serious drawbacks. First, just a GitHub account may not be enough info about someone in and of itself. You’ll probably want to find the person’s LinkedIn to see where they work and how long they’ve been there, as well as their personal site/blog if one exists (and probably other stuff that comprises their web presence)1. Secondly, GitHub’s rate limit is pretty crippling, as you saw above.

If you don’t want to deal with either of these drawbacks and are down to shell out a bit of cash, Sourcing.io is pretty terrific. It’s a lightweight search aggregator that pulls in info about engineers from GitHub, Twitter, and a slew of other sources, provides merit scores based on GitHub activity and Twitter followings, and includes links to pretty much everything you’d want to know about a person inline (LinkedIn, Twitter, GitHub, personal website, etc). Unlike many of its competitors, it has supremely elegant UX and non-enterprise pricing. And, most importantly, it has a fantastic API that lets you search by email and, at least at the time of this posting, doesn’t have firm rate limits. As I mentioned, in addition to returning someone’s GitHub profile, the API will provide you with a bunch of other useful info if it exists. Here’s an example response to give you an idea of some of the fields that are available:

{
    "id":"f8d0595a-af59-41e9-b991-1da5c1e7a60d",
    "slug":"alex-maccaw",
    "name":"Alex MacCaw",
    "first_name":"Alex",
    "last_name":"MacCaw",
    "email":"info@eribium.org",
    "url":"http://alexmaccaw.com",
    "score":82.2,
    "headline":"Ruby developer at Sourcing",
    "bio":"",
    "educations":[],
    "twitter":"maccaw",
    "github":"maccman",
    "linkedin":"/pub/alex-maccaw/78/929/ab5",
    "facebook":null,
    "languages":[
        "Ruby",
        "JavaScript",
        "CoffeeScript"
    ],
    "frameworks":[
        "Rails",
        "Node",
        "OSX",
        "Cocoa",
        "iOS"
    ],
    "location":"San Francisco, CA, USA",
    "company_name":"Sourcing",
    ...
}

And here are teh codez for getting someone’s title, homepage, LinkedIn, GitHub, and Twitter from their email:

import json
import urllib2

def call_sourcingio(email):
  """Calls Sourcing.io with an email
  Returns a user's title and relevant URLs, if any exist.
  Otherwise returns None """
  try:
    url = 'https://api.sourcing.io/v1/people/email/{0}'

    # TODO insert your API key here
    key = ''

    request = urllib2.Request(url.format(email))
    request.add_header('Authorization', 'Bearer {0}'.format(key))
    response_object = urllib2.urlopen(request)
    response = json.loads(response_object.read())
  except urllib2.HTTPError, err:
    if err.code == 404:
      return None
    else:
      raise err

  return { 
  'headline': response['headline'], \
  'linkedin': 'https://www.linkedin.com/{0}'.format(response['linkedin']), \
  'github': 'https://github.com/{0}'.format(response['github']), \
  'twitter': 'https://twitter.com/{0}'.format(response['twitter']), \
  'url': response['url'] 
  }

Going through your result set with less pain

Once you have your result set, you have to go through all your users’ GitHub accounts to figure out who’s worth contacting and what to say to them. In most cases, this result set will be pretty small because most of your users probably won’t be engineers, so going through it by hand isn’t the worst thing in the world. That said, there are still a few things you can do to make your life easier.

I ended up writing a script that iterated through every user who had a GitHub account and printed them to the terminal as it went. Each time a new user came up, my script would open a new browser tab with their GitHub account ready to go (and potentially other URLs). It would also copy the current user’s email address to the clipboard so that if I decided to contact them, I’d have one less thing to do. Note that to use Pyperclip (the thing that handles copying to the clipboard), you have to download it first.

import json
import webbrowser
import pyperclip

def parse_potential_engineers(results_filename):
  """Goes through potential engineers one by one"""
  json_data = open(results_filename)
  results = json.load(json_data)
  json_data.close()

  for email in results:
    if results[email] == None:
      continue

    print email
    print results[email]

    webbrowser.open(results[email]['github']) # open GitHub page in a new browser window
    pyperclip.copy(email) # copy email address to clipboard
    sys.stdin.readline()

Don’t be awful

As a final word of caution, I’ll say this. If you’re the one using these scripts, you’re either an engineer or have been one at some point, and you know how shitty it is to be inundated with soulless recruiter spam. These tactics give you the opportunity to find out enough about the people you’ll be emailing to write something meaningful. Talk about their projects and how those projects relate to what your company is doing. Talk about why someone would want to work for your company beyond just stupid perks or how well-funded you are. And if you find engineers who have great blogs or GitHubs but no pedigree, give them a chance. In other words, please use your newfound powers for good.

EDIT: After some good discussion, I think it’s worth pointing out that while a GitHub account isn’t a bad signal for whether someone is an engineer, the absence of one most certainly is NOT a signal that someone is not an engineer, and it’s certainly not a signal that someone is not a great one. Several of the best engineers I know simply don’t have side projects or don’t throw their side projects’ source code out into the world or aren’t involved in the open source community. In other words, I would hate for someone to interpret this post as validation that a GitHub account is a good gating mechanism for job applicants.

 

1Tools like Rapportive and Connectifier can make your life easier when it comes to getting additional context about people, but Connectifier isn’t cheap, and Rapportive requires toggling Gmail. You can try using Rapportive programmatically — Jordan Wright did build a pretty great Python wrapper around Rapportive’s undocumented API, and I was trying to use it initially to get more info about users — but that thing gets throttled real fast-like.

Looking for a job yourself? Work with a recruiter who’s a former engineer and can actually understand what you’re looking for. Drop me a line at aline@alinelerner.com.

Review of Lever

Note: This post was adapted from a review I wrote on Quora.

This is a review of Lever, my favorite applicant tracking system (ATS).

ATSs suck

There are a lot of bad ATSs out there. I’ve tried out most of them at this point, and I’m consistently shocked at how they manage to stay in business. Their ability to survive is a testament to how terrible the recruiting software space is. Hiring software is really hard to build correctly because as soon as you get some traction, your big-name enterprise customers start demanding one-off customizations in exchange for fat wads of cash, and that’s hard to say no to. But as soon as you go down the perilous rabbit hole of building custom features, you lose out on usability for the rest of your customer base. As a result, building software can easily devolve into a game of checking as many feature boxes as possible. This is especially true in enterprise sales because the person who ultimately makes the purchasing decisions isn’t necessarily the same person who actually uses the product in question. An extreme example of this is Taleo, which purports to do everything from sourcing and managing the interview pipeline to onboarding candidates and reviewing employees. On paper, it checks all the boxes. It’s the perfect system. But in reality, the product pretty much fails at everything. So then instead of getting a system that does everything, you end up with a hefty subscription fee for a system that actually does nothing useful at all. By the time you realize this, they have all your data. Migration is a nightmare, so you just grin and bear it because they have you by the balls.

It’s absurd how companies that are otherwise really good at things continue to give money to bloated, clunky, borderline unusable products like Taleo and Jobvite.

Lever is awesome

Lever isn’t like that. You know how sometimes you use a product and just feel really good because you can tell it was made by people who get it? It’s so rare, and it’s even rarer in B2B/enterprise software space. With Lever, it’s immediately clear that UX was a priority from day one and that the people who built it took the time to understand the recruiting space rather than just frankensteining together features that sound good on paper. Everything about the Lever experience is deliberate. The product is moving toward a vision of recruiting that I really love — there’s no process for the sake of process, just a system that unobtrusively gives you access to all the info you need and lets you intuitively collaborate with anyone on your team who’s involved in hiring.

Below is some stuff about Lever I really like.

Amazing LinkedIn (and other site) integration
This feature is ridiculously useful. If you’re looking at someone’s LinkedIn profile, Lever’s browser extension can scrape the page and automatically create a corresponding entry in Lever. On top of that, if I’m looking at someone’s LinkedIn profile, and a candidate with the same name already exists in Lever, Lever alerts me and asks if they’re the same person.

This feature is fantastic for collaboration around sourcing, especially if a few people are sourcing within your organization. In the status quo, candidate information is probably going to be in some mix of different people’s emails, LinkedIn, StackOverflow, your current ATS, social media, and so on. This dispersal of information makes it difficult to effectively track who’s been reached out to. The low-friction candidate import features in Lever make it possible for all this info to live in one place, ultimately making it much easier to track response rates and prevent noob mistakes like contacting the same person multiple times without context.

In addition to LinkedIn, you can use the extension with pretty much any site (e.g. GitHub), though not all sites will have a preconfigured scraper.

Snapshot of state
When you log in, you can see exactly who’s in what part of the pipeline.

When you drill down into an individual candidate, you can see everything that’s happened until now inline, including all candidate communication, as well as internal notes and interview feedback. Moreover, any activity that goes with a candidate is always associated with an actor and a timestamp so that you can begin to ask and answer questions about your actual internal process and how it’s going. You’d think this is a given, but I haven’t seen any other product nail this aspect.

It’s not gross to use for non-recruiters
If you want someone else on your team to do something related to a candidate, you just @ them, and the conversation ends up in email. This is particularly awesome because it’s intuitive for people on your team who aren’t recruiters. In general, the feel of the product is about fitting in with Google Apps rather than making you do things in a bunch of places.

The Lever experience also tries to mirror how people would actually operate during hiring if the hiring process weren’t muddled with, well, process. The notion of filling out forms to submit feedback is absent, and though you can customize your process in any way that you want, the idea that you have to rank candidates on some scale or fill in ten different form fields about candidates’ aptitude isn’t something that’s taken for granted. The product doesn’t make any assumptions. All it does is remove the friction associated with storing information and having a bunch of different people interact with it, whether they’re used to doing hiring-related stuff or not.

Ability to snooze candidates and delay rejection emails
Lever has a few Boomerang-like features I really like. Let’s say you reach out to someone over email, and they respond to tell you that they’re stuck at their current gig for the next 6 months. There are a number of home-grown ways to deal with this in the status quo, but none of them are centralized. Lever actually lets you “snooze” candidates for some amount of time and resurface them when it’s time to make contact again. You even get a calendar invite reminder.

Another great feature is rejection email delay. I hate looking at a candidate, deciding he’s not a fit, and then having to remember to reject him later because it’s kind of terrible to reject someone 5 minutes after they apply. To avoid this, Lever lets you schedule rejections for some time in the future.

Things I don’t like

Lever isn’t without shortcomings. The product is extremely young. Bugs crop up from time to time. And there are lots of features that are missing. Here are some shortcomings I noticed.

Absence of tracking on outbound emails and job postings
It would be cool if you could track when candidates open emails and track click-through rate on job postings.

Inconsistent candidate duplication detection
As I mentioned above, Lever’s LinkedIn browser extension is very good at catching candidates you’re looking at in LinkedIn who might also exist in Lever. Unfortunately, if you enter candidates in manually, and you already have someone in the system with the same name, there isn’t any warning asking you if this is intentional.

Somewhat clunky setup process
When you initially set up Lever, you have to talk with someone on the team to get it configured for your company’s pipeline/interview process. Though this isn’t unpleasant, ideally there’d be a way to self-serve any configuration stuff through an admin site.

Stop giving money to shitty products

At the end of the day, I’d caution users to not get into the trap of comparing a product to its platonic ideal over comparing a product to the status quo. In terms of what’s out there at the moment, and given how great Lever is already, I have no idea why everyone wouldn’t use it. Yeah, migration is kind of a bitch (though Lever does its best to make it less painful), and yeah, it’s kind of risky to depend on a very young company that might not be around in a few years, but I see moving away from a terrible ATS to Lever as not only as a rational decision but as a proud rejection of clunky, enterprise systems that make users miserable. Sunk costs, fear, and complacency are not good reasons to make decisions. Awesomeness is.

Silicon Valley hiring is not a meritocracy

Note: This post was adapted from an answer I wrote on Quora.

I’ve been hiring people in some capacity for the past 3 years. First, I was doing it as an engineer, then as an in-house recruiter, and now as the owner of my own technical recruiting firm. Until recently, I was quite sure that the startup world was as meritocratic as something could reasonably be.

I was wrong. Most hiring managers pay lip service to how they’re looking for strong CS fundamentals, passion, and great projects over pedigree, but in practice, the system breaks down. The sad truth is that if you don’t look great on paper and you’re applying to a startup that has a strong brand, unless you know someone in the company, the odds of you even getting an interview are very slim.

People focus on the technical interview itself as a potential source of unfairness in the hiring process, but I’m going to focus on something that I think is even more of a problem: just getting your foot in the door.

How the system breaks down

Let’s say you’re the technical co-founder of a new startup that has had some traction, and there’s a backlog of work piling up. Fortunately, you’ve raised enough money to hire a few engineers. Chances are, you’re the sole technical person, so you’re probably the one working on engineering recruiting full-time. This sucks a bit for you because it’s likely not what you think you’re best at, and it sucks a bit for the company because of opportunity cost. However, you’re probably doing a pretty good job because 1) you’re really good at selling the vision of the company because you’re so vested in it and 2) you have the technical chops and intuition to evaluate people based on some set of internal heuristics you’ve acquired through experience. Because of these heuristics, you’re probably going end up letting in people who seem smart even if they don’t look great on paper.

Eventually, things are going well, your startup gets some more funding, and you decide you want to go back to doing what you think is real work. You hire an in-house recruiter to deal with your hiring pipeline full-time.

This is where things start going south. Because recruiters are, generally speaking, not technical, instead of relying on some internal barometer for competence, they have to rely on quickly identifiable attributes that function as a proxy for aptitude. OK, you think, I’m going to give my recruiter(s) some guidelines about what good candidates look like. You might even create some kind of hiring spec to help them out.
These hiring specs, whether a formal document or just a list of criteria, tend to focus on candidate attributes that maximize on odds of the candidate being good while minimizing on the specialized knowledge it takes to draw these conclusions. Examples of these attributes include:

  • CS degree from a top school
  • having worked at a top company
  • knowledge of specific languages/frameworks1
  • some number of years of experience

This system works… sort of. If your company has a pretty strong brand, you can afford a high incidence rate of false negatives because there will always be a revolving door of candidates. And while these criteria aren’t great, they clearly perform well enough to perpetuate their existence.

Unfortunately, this system creates a massive long tail of great engineers who are getting overlooked even in the face of the perceived eng labor shortage.

What about side projects?

Folk wisdom suggests that having a great portfolio of side projects can help get you in the door if you don’t look great on paper. I wish this were more true. However, unless your side project is high profile, easy for a layperson to understand, or built with the API of the company you’re applying to, it will probably get overlooked.

The problem is the huge variety of side projects, and how indistinguishable to appear to a non-technical recruiter. To such a person, a copy-and-pasted UI tutorial with a slight personal spin is the same thing as a feature-complete from-scratch JavaScript framework. Telling the difference between these kinds of projects is somewhat time-consuming for someone with a technical background and almost impossible for someone who’s never coded before. Therefore, while awesome side projects are a HUGE indicator of competence, if the people reading resumes can’t (either because of lack of domain-specific knowledge or because of time considerations) tell the difference between awesome and underwhelming, the signal gets lost in the noise.

To be clear, I am not discouraging building stuff. Building stuff on your own time is a great way to learn because you run into all sorts of non-deterministic challenges and gotchas that you wouldn’t have otherwise. Few things will better prepare you for the coding interviews that test if you know how the web works, have decent product sense, can design DB schema, and so on. It’s just that having built stuff is no guarantee of even getting your foot in the door.

What can we do about this?

Bemoaning that non-technical people are the first to filter candidates is silly because it’s not going to change. What can change, however, is how they do the filtering. There are a few things I can think of to fix this.

  • Figure out which attributes are predictors of success, going beyond low hanging fruit like pedigree. This is hard. I tried to do it, and Google’s done it. I wish more companies did this kind of thing and published their results.
  • Find other ways of evaluating candidates that are at least as cheap and effective as resume filtering but are actually more primary indicators of competence.
  • Establish a free/low-cost elite set of CS classes that anyone can get into but that most cannot complete, because of the classes’ difficulty and stringent grading. Build this into a brand strong enough for companies to trust. This way merit can effectively be tied to pedigree. This is also really hard. I know a number of companies are attacking this space, though, and I am excited to see how hiring will change in the next few years as a result.

My second suggestion is the one I will talk about here because it’s the least hard to implement and because pieces of it are already in place.

To avoid being judged on pedigree alone, in lieu of the traditional resume/cover letter, applicants who don’t look great on paper could apply to a specific company with 1) a coding challenge and 2) a writing sample..

Ideally the coding challenge could be scored quickly and automatically. It should also be an interesting problem that gives some insight into the company’s engineering culture and the kinds of problems they’re solving rather than a generic FizzBuzz. If the coding challenge is too textbook, a certain subset of smart people aren’t going to want to waste their time on it. Tools to automate coding evaluation already exist (e.g. InterviewStreet, Codility, and Hackermeter), but they’re not currently baked into the application process the right way. In the case of InterviewStreet and Codility, candidates have to already be in a given company’s pipeline to be able to work on the coding challenges, which means that anyone who looks bad on paper will be cut before getting to write a single line of code. In the case of Hackermeter, candidates code first and hope to be matched up with a company later. This approach has a lot of potential, but unless Hackermeter can establish themselves as the definitive startup application go-to, candidates will always be limited by Hackermeter’s client list. The ideal solution will probably be a more decentralized version of Hackermeter that companies can administer from their jobs page.

The idea for the writing sample came from the study I conducted. Of the many attributes I tested, the three that achieved statistical significance were grammatical mistakes, whether the candidate worked at a top company, and whether you could tell from their resume what they did at each of their previous positions. Two out of the three attributes, in other words, had to do with a candidate’s written communication skills.

For the writing sample, I am imagining something that isn’t a cover letter — people tend to make those pretty formulaic and don’t talk about anything too personal or interesting. Rather, it should be a concise description of something you worked on recently that you are excited to talk about, as explained to a non-technical audience. I think the non-technical audience aspect is critical because if you can break down complex concepts for a non-technical audience, you’re probably a good communicator and actually understand what you worked on. Moreover, recruiters could actually read this description and make valuable judgments about whether the writing is good and whether they understand what the person did. If recruiters can be empowered to make real judgments rather than acting as a human keyword matcher, they’d probably be a lot better at their jobs. In my experience, many recruiters are very smart, but their skills aren’t being used to their full potential.

The combination of a successfully completed coding challenge and an excellent writing sample should be powerful enough to get someone on the phone for a live coding round.

The problem of surfacing diamonds in the rough is hard, but I think these relatively small, realistic steps can go a long way toward helping find them and make eng hiring into the meritocracy that we all want it to be.

1This is definitely very relevant for specialized roles, but I see it a lot for full-stack generalist kinds of roles as well

Looking for a job yourself? Work with a recruiter who’s a former engineer and can actually understand what you’re looking for. Drop me a line at aline@alinelerner.com.

Review of CoderPad

Note: This post was adapted from an review I originally wrote on Quora.

I recently found out about CoderPad, a collaborative coding tool that lets you run your code as you go and is particularly handy for technical interviews. Here’s what I thought of it.

Interviewing is hard

I’ve been on both sides of the table for a fair amount of technical interviews—from phone screens and live coding rounds to in-person whiteboard sessions. For both the interviewee and the interviewer, the live coding round can be particularly draining for a variety of reasons.

As an interviewee, you’re faced with a variety of stumbling blocks during this round. For one, you’re working with an IDE or editor that’s likely an inferior version of what you’re used to. In a real-life situation, it’s also unlikely that you would write an entire function or class without testing it every so often. As a result, the cycles spent on working in an unfamiliar environment and without your usual testing routine can detract from what matters: showing what you can do.

As an interviewer, it’s a balancing act: staying engaged while a candidate muddles through a problem, keeping the candidate from straying too far down the wrong path, and stopping yourself from spoon-feeding them solutions. You’re simultaneously tracking their process—namely how long it took the candidate to to come up with a game plan, when they turned that game plan into a shitty solution, and when they took the clumsy solution and made it elegant. And unless you want to look like a n00b, you’ll have to be able to catch new bugs and evaluate creative solutions effectively.

Coding interviews are, at their best, a proxy for actual on-the-job aptitude. At their worst, they’re poor amalgams of real work environments where you’re stripped of the ability to syntax-highlight and run your damn code.

So, yeah, interviewing is hard.

The other guys

To address some of these issues, it helps to have a tool that can do some of the heavy lifting for you.

One of these tools is Google docs, which many companies still use to conduct live coding rounds. On the upside, Google docs can be extremely versatile and the ability to draw can come in handy if part of the coding round is conceptual or high level. Working against it: lack of indentation, line numbers, and syntax highlighting. Oh, and, you can’t run your code.

A big improvement over that is a tool called Collabedit. Collabedit has a slick UI and provides support for nearly every language you’d need. However, candidates still have to code blind—it can’t run code either.

CoderPad

Enter CoderPad. CoderPad is a collaborative editor with REPL built in. It’s not the first product to feature collaborative coding, and it’s not the first product to feature live REPL, but it is the first product I’ve seen that utilizes these two elements really, really well.

In other words, CoderPad allows both the interviewer and interviewee to run code as it’s being written.

Screen Shot 2013-08-14 at 9.39.41 PM

CoderPad supports a number of interpreted and compiled languages—which is pretty awesome. In addition to more closely mimicking how people actually work, it takes the heat off of the interviewer a bit so he can focus on whether the candidate is a good fit.

Cool features include:

  • Nice aesthetics: syntax highlighting, line numbers, indentation
  • Great language coverage for both compiled and interpreted languages including: JavaScript, Python, Ruby, Java, Scala, C/C++, and Go
  • Really beautiful/slick UI
  • Ability to include as many collaborators as you want
  • Playback feature so you can see how people got there, rather than just the end code
  • Reasonable pricing scheme

Some limitations/nice-to-haves:

  • Ability to add timestamps as the candidate works so you can track progression
  • Ability to unshare code with candidate after the interview is over
  • Faster compile times (interpreter is really fast)

Despite these minor limitations, as far as I know, there isn’t another collaborative coding tool with live REPL out there that approaches CoderPad’s level of polish and utility. You should give it a spin next time you’re interviewing someone.

Lessons from a year’s worth of hiring data

I ran technical recruiting at TrialPay for a year before going off to start my own agency. Because I used to be an engineer, one part of my job was conducting first-round technical interviews, and between January 2012 and January 2013, I interviewed roughly 300 people for our back-end/full-stack engineer position.

TrialPay was awesome and gave me a lot of freedom, so I was able to use my intuition about whom to interview. As a result, candidates ranged from self-taught college dropouts or associate’s degree holders to PhD holders, ACM winners, MIT/Harvard/Stanford/Caltech students, and Microsoft, Amazon, Facebook, and Google interns and employees with a lot of people in between.

While interviewing such a wide cross section of people, I realized that I had a golden opportunity to test some of the prevalent folk wisdom about hiring. The results were pretty surprising, so I thought it would be cool to share them. Here’s what I found:

And the least surprising thing that I was able to confirm was that:

Of course, a data set of size 300 is a pittance, and I’m a far cry from a data scientist. Most of the statistics here is done with the help of Statwing and with Wikipedia as a crutch. With the advent of more data and more rigorous analysis, perhaps these conclusions will be proven untrue. But, you gotta start somewhere.

Why any of this matters

In the status quo, most companies don’t run exhaustive analyses of hiring data, and the ones that do keep it closely guarded and only share vague generalities with the public. As a result, a certain mysticism persists in hiring, and great engineers who don’t fit in “the mold” end up getting cut before another engineer has the chance to see their work.

Why has a pedigree become such a big deal in an industry that’s supposed to be a meritocracy? At the heart of the matter is scarcity of resources. When a company gets to be a certain size, hiring managers don’t have the bandwidth to look over every resume and treat every applicant like a unique and beautiful snowflake. As a result, the people doing initial resume filtering are not engineers. Engineers are expensive and have better things to do than read resumes all day. Enter recruiters or HR people. As soon as you get someone who’s never been an engineer making hiring decisions, you need to set up proxies for aptitude. Because these proxies need to be easily detectable, things like a CS degree from a top school become paramount.

Bemoaning that non-technical people are the first to filter resumes is silly because it’s not going to change. What can change, however, is how they do the filtering. We need to start thinking analytically about these things, and I hope that publishing this data is a step in the right direction.

Method

To sort facts from folk wisdom, I isolated some features that were universal among resumes and would be easy to spot by technical and non-technical people alike and then ran statistical significance tests on them. My goal was to determine which features were the strongest signals of success, which I defined as getting an offer. I ran this analysis on people whom we decided to interview rather than on every applicant; roughly out 9 out of 10 applicants were screened out before the first round. The motivation there was to gain some insight into what separates decent candidates from great ones, which is a much harder question than what separates poor candidates from great ones.

Certainly there will be some sampling bias at play here, as I only looked at people who chose to apply to TrialPay specifically, but I’m hoping that TrialPay’s experience could be a stand-in for any number of startups that enjoy some renown in their specific fields but are not known globally. It also bears mentioning that this is a study into what resume attributes are significant when it comes to getting hired rather than when it comes to on-the-job performance.

Here are the features I chose to focus on (in no particular order):

  • BS in Computer Science from a top school (as determined by U.S. News and World Report)
  • Number of grammatical errors, spelling errors, and syntactic inconsistencies
  • Frequency of buzzwords (programming languages, frameworks, OSes, software packages, etc.)
  • How easy it is to tell what someone did at each of their jobs
  • Highest degree earned
  • Resume length
  • Presence of personal projects
  • Work experience in a top company
  • Undergraduate GPA

TrialPay’s hiring bar and interview process

Before I share the actual results, a quick word about context is in order. TrialPay’s hiring standards are quite high. We ended up interviewing roughly 1 in 10 people that applied. Of those, after several rounds of interviewing (generally a phone screen followed by a live coding round followed by onsite), we extended offers to roughly 1 in 50, for an ultimate offer rate of 1 in 500. The interview process is pretty standard, though the company shies away from asking puzzle questions that depend on some amount of luck/clicking to get the correct answer. Instead, they prefer problems that gradually build on themselves and open-ended design and architecture questions. For a bit more about what TrialPay’s interview process looks like, check out Interviewing at TrialPay 101.

The results

Now, here’s what I discovered. The bar height represents effect size. Every feature with a bar was statistically significant, and if you mouse over each bar, you can also see the p-value. These results were quite surprising, and I will try to explain and provide more info about some of the more interesting stuff I found.

Number of errors

The most significant feature by far was the presence of typos, grammatical errors, or syntactic inconsistencies.

Errors I counted included everything from classic transgressions like mixing up “its” and “it’s” to typos and bad comma usage. In the figure below, I’ve created a fictional resume snippet to highlight some of the more common errors.

errors-visual-annotated

This particular result was especially encouraging because it’s something that can be spotted by HR people as well as engineers. When I surveyed 30 hiring managers about which resume attributes they thought were most important, however, no one ranked number of errors highest. Presumably, hiring managers don’t think that this attribute is that important for a couple of reasons: (1) resumes that are rife with mistakes get screened out before even getting to them and (2) people almost expect engineers to be a bit careless with stuff like spelling and grammar. With respect to the first point, keep in mind that the resumes in this analysis were only of people whom we decided to interview. With respect to the 2nd point, namely that engineers shouldn’t be held to the same writing standards as people in more humanities-oriented fields, I give you my next chart. Below is a breakdown of how resumes that ultimately led to an offer stacked up against those that didn’t. (Here, I’m showing the absolute number of errors, but when I ran the numbers against number of errors adjusted for resume length, the result were virtually identical.)

If you want to play with these histograms, just click on the image, and an interactive version will pop up in a separate window.

errorshistogram

As you can see, the distributions look quite different between the group of people who got offers and those that didn’t. Moreover, about 87% of people who got offers made 2 or fewer mistakes.

In startup situations, not only are good written communication skills extremely important (a lot of heavy lifting and decision making happens over email), but I have anecdotally found that being able to write well tends to correlate very strongly with whether a candidate is good at more analytical tasks. Not submitting a resume rife with errors is a sign that the candidate has strong attention to detail which is an invaluable skill when it comes to coding, where there are often all manners of funky edge cases and where you’re regularly being called upon to review others’ code and help them find obscure errors that they can’t seem to locate because they’ve been staring at the same 10 lines of code for the last 2 hours.

It’s also important to note that a resume isn’t something you write on the spot. Rather, it’s a document that you have every opportunity to improve. You should have at least 2 people proofread your resume before submitting it. When you do submit, you’re essentially saying, “This is everything I have done. This is what I’m proud of. This is the best I can do.” So make sure that that is actually true, and don’t look stupid by accident.

Top company

No surprises here. The only surprise is that this attribute wasn’t more significant. Though I’m generally not too excited by judging someone on pedigree, having been able to hold down a demanding job at a competitive employer shows that you can actually, you know, hold down a demanding job at a competitive employer.

Of all the companies that our applicants had on their resumes, I classified the following as elite: Amazon, Apple, Evernote, Facebook, Google, LinkedIn, Microsoft, Oracle, any Y Combinator startup, Yelp, and Zynga.

Undergraduate GPA

After I ran the numbers to try to figure out whether GPA mattered, the outcome was a bit surprising: GPA appeared to not matter at all. Take a look at the GPA distribution for candidates who got offers versus candidates that didn’t (click to get a bigger, more interactive version).

As a caveat, it’s worth mentioning that roughly half of our applicants didn’t list their GPAs on their resumes, so not only is the data set smaller, but there are probably some biases at play. I did some experiments with filling in the missing data and separating out new grads, and I will discuss those results in a future post.

Is it easy to tell what the candidate actually did?

Take a look at this role description:

good_description

Now take a look at this one:

bad_description

In which of these is it easier to tell what the candidate did? I would argue that the first snippet is infinitely more clear than the second. In the first, you get a very clear idea of what the product is, what the candidate’s contribution was in the context of the product, and why that contribution matters. In the second, the candidate is using some standard industry lingo as a crutch — what he said could easily be applied to pretty much any software engineering position.

Judging each resume along these lines certainly wasn’t an exact science, and not every example was as cut-and-dry as the one above. Moreover, while I did my best to avoid confirmation bias while deciding whether I could tell what someone did, I’m sure that the system wasn’t perfect. All this said, however, I do find this result quite encouraging. People who are passionate about and good at what they do tend to also be pretty good at cutting to the chase. I remember the feeling of having to write my resume when I was looking for my first coding job, and I distinctly remember how easily words flowed when I was excited about a project versus when I knew inside that whatever I had been working on was some bullshit crap. In the latter case is when words like “software development life cycle” and a bunch of acronyms reared their ugly heads… a pitiful attempt to divert the reader from lack of substance by waving a bunch of impressive sounding terms in his face.

This impression is further confirmed by a word cloud generated from candidate resumes that received an offer versus those that didn’t. For these clouds, I took words that appeared very frequently in one data set relative to how often they appeared in the other one.

wordcloud_offer
Offer

 

wordcloud_nooffer
No offer

 

As you can see, “good” resumes focused much more on action words/doing stuff (“manage”, “ship”, “team”, “create”, and so on) versus “bad” resumes which, in turn, focused much more on details/technologies used/techniques.

Highest degree earned

Though highest degree earned didn’t appear to be significant in this particular data set, there was a definite trend that caught my attention. Take a look at the graph of offers extended as a function of degree.

As you can see, the higher the degree, the lower the offer rate. I’m confident that with the advent of more data (especially more people without degrees and with master’s degrees), this relationship will become more clear. I believe that self-motivated college dropouts are some of the best candidates around because going out of your way to learn new things on your own time, in a non-deterministic way, while juggling the rest of your life is, in some ways, much more impressive than just doing homework for 4 years. I’ve already ranted quite a bit about how worthless I find most MS degrees to be, so I won’t belabor the point here.1

BS in Computer Science from a top school

But wait, you say, even if highest degree earned doesn’t matter, not all BS degrees are created equal! And, having a BS in Computer Science from a top school must be important because it’s in every fucking job ad I’ve ever seen!

And to you I say, Tough shit, buddy. Then I feel a bit uncomfortable using such strong language, in light of the fact that n ~= 300. However, roughly half of the candidates (122, to be exact) in the data set were sporting some fancy pieces of paper. And yet, our hire rate was not too different among people who had said fancy pieces of paper and those that didn’t. In fact, in 2012, half of the offers we made at TrialPay were to people without a BS in CS from a top school. This doesn’t mean that every dropout or student from a 3rd rate school is an unsung genius — there were plenty that I cut before interviewing because they hadn’t done anything to offset their lack of pedigree. However, I do hope that this finding gives you a bit of pause before taking the importance of a degree in CS from a top school at face value.

pedigree

In a nutshell, when you see someone who doesn’t have a pedigree but looks really smart (has no errors/typos, very clearly explains what they worked on, shows passion, and so forth), do yourself a favor and interview them.

Personal projects

Of late, it’s become accepted that one should have some kind of side projects in addition to whatever it is you’re doing at work, and this advice becomes especially important for people who don’t have a nice pedigree on paper. Sounds reasonable, right? Here’s what ends up happening. To game the system, applicants start linking to virtually empty GitHub accounts that are full of forked repos where they, at best, fixed some silly whitespace issue. In other words, it’s like 10,000 forks when all you need is a glimmer of original thought.

Yay forks.

Outside of that, there’s the fact that not all side projects are created equal. I can find some silly tutorial for some flashy UI thing, copy the code from it verbatim, swap in something that makes it a bit personal, and then call that a side project on my resume. Or I can create a new, actually useful JavaScript framework. Or I can spend a year bootstrapping a startup in my off hours and get it up to tens of thousands of users. Or I can arbitrarily call myself CTO of something I spaghetti-coded in a weekend with a friend.

Telling the difference between these kinds of projects is somewhat time-consuming for someone with a technical background and almost impossible for someone who’s never coded before. Therefore, while awesome side projects are a HUGE indicator of competence, if the people reading resumes can’t (either because of lack of domain-specific knowledge or because of time considerations) tell the difference between awesome and underwhelming, the signal gets lost in the noise.

Conclusion

When I started this project, it was my hope that I’d be able to debunk some myths about hiring or at least start a conversation that would make people think twice before taking folk wisdom as gospel. I also hoped that I’d be able to help non-technical HR people get better at filtering resumes so that fewer smart people would fall through the cracks. Some of my findings were quite encouraging in this regard because things like typos/grammatical errors, clarity of explanation, and whether someone worked at an elite company are all attributes that a non-technical person can parse. I was also especially encouraged by undergraduate pedigree not necessarily being a signal of success. At the end of the day, spotting top talent is extremely hard, and much more work is needed. I’m optimistic, however. As more data becomes available and more companies embrace the spirit of transparency, proxies for aptitude that don’t stand up under scrutiny will be eliminated, better criteria will take their place, and smart, driven people will have more opportunities to do awesome things with their careers than ever before.

Acknowledgements

A huge thank you to:

  • TrialPay, for letting me play with their data and for supporting my ideas, no matter how silly they sounded.
  • Statwing, for making statistical analysis civilized and for saving me from the horrors of R (or worse, Excel).
  • Everyone who suggested features, helped annotate resumes, or proofread this monstrosity.

EDIT: See Hacker News for some good discussion.

1It is worth mentioning that my statement about MS degrees potentially being a predictor of poor interview performance does not contradict this data — when factoring in other roles I interviewed for, especially more senior ones like Director of Engineering, the (negative) relationship is much stronger.

Looking for a job yourself? Work with a recruiter who’s a former engineer and can actually understand what you’re looking for. Drop me a line at aline@alinelerner.com.