Hire from your user base

Recruiting great engineers is hard. You might guess that the hardest part is finding great people, but with the advent of LinkedIn, GitHub, and a number of search aggregators, it’s actually not too bad. The really hard part is finding great people who are likely to be interested in what you’re doing. To that end, wouldn’t it be great if you could reach out specifically to engineers who are already familiar with your product? It turns out that with a list of user emails and a few scripts, you can.

Companies with brands that are household names are doing this already — when they do cold reachouts, those reachouts are actually warm — chances are, anyone they contact will have used or heard of their product at some point. However, for companies who aren’t yet household names, cold reachouts are not that effective. Personalization of messages can help, sure, but execution can be difficult, either because the person writing the messages doesn’t have enough technical depth to do this right or because there’s not enough info available on the internets about the recipient to say anything meaningful.

So, how do you find engineers who are already familiar with your offering? All you need is a list of your users’ emails. If you don’t have that list, you’re probably doing it wrong. Also, if you’re a B2B company, you probably shouldn’t be poaching engineers from people who give you money, but I’ll leave questions of case-specific ethics as an exercise for the reader.

Once you have your user list, you can filter it to find the engineers by checking each email against GitHub to see if there’s a corresponding account. There are 2 ways to do this. One is free but is also information-poor, and slow. The other will cost you a few hundred bucks but will provide a lot more info about people and will be orders of magnitude faster. Choose your weapon.

To get all the code mentioned in this post, go to my GitHub.

Free but slow

One way to see if someone is an engineer is to ping GitHub directly. This is something you’re probably already doing anyway when you source, but doing it manually is slow and intractable on a large user list. Instead, here’s a Python script that pings GitHub to see if there are GitHub accounts associated with your users’ emails.

import json
import urllib
import urllib2
import base64

def call_github(email):
  """Calls GitHub with an email
  Returns a user's GitHub URL, if one exists.
  Otherwise returns None """
  url = 'https://api.github.com/search/users'
  values = {'q' : '{0} in:email type:user repos:>0'.format(email) }

  # TODO insert your credentials here or read them from a config file or whatever
  # First param is username, 2nd param is passwd
  # Without credentials, you'll be limited to 5 requests per minute
  auth_info = '{0}:{1}'.format('','')

  basic = base64.b64encode(auth_info)
  headers = { 'Authorization' : 'Basic ' + basic }
  params = urllib.urlencode(values)
  req = urllib2.Request('{0}?{1}'.format(url,params), headers=headers)
  response = json.loads(urllib2.urlopen(req).read())

  if response and response['total_count'] == 1:
    return response['items'][0]['html_url'] 
  else:
    return None 

I chose to only consider people who have at least one repo. You can be more liberal and look at everyone, but empirically profiles with 0 repos have been pretty useless. You can also filter further by limiting your search to people who have some number of followers, code in a certain language, etc. The full list of params is here. Note that although location is listed as a filter and can be good (i.e. if you can’t support remote workers and if it’s not H-1B season), the search is not as useful as it could be because location is a self-reported, optional field, and users who don’t specify location will not be included in the results.

One of the big downsides of this approach is contending with GitHub’s rate limit: 20 requests/min with credentials or 5 requests/min without. To put things in perspective, if you have 500,000 email addresses to go through, at 20 requests/minute, you’re looking at about 18 days for something like this to run. With that in mind, if you’re dealing with a long list of emails, you could potentially use a number of GitHub credentials and load balance requests across them. At the very least, I’d suggest spinning up an EC2 instance and running your script in a separate screen.

Not free but verbose and fast

Though the GitHub approach isn’t bad, it has 2 serious drawbacks. First, just a GitHub account may not be enough info about someone in and of itself. You’ll probably want to find the person’s LinkedIn to see where they work and how long they’ve been there, as well as their personal site/blog if one exists (and probably other stuff that comprises their web presence)1. Secondly, GitHub’s rate limit is pretty crippling, as you saw above.

If you don’t want to deal with either of these drawbacks and are down to shell out a bit of cash, Sourcing.io is pretty terrific. It’s a lightweight search aggregator that pulls in info about engineers from GitHub, Twitter, and a slew of other sources, provides merit scores based on GitHub activity and Twitter followings, and includes links to pretty much everything you’d want to know about a person inline (LinkedIn, Twitter, GitHub, personal website, etc). Unlike many of its competitors, it has supremely elegant UX and non-enterprise pricing. And, most importantly, it has a fantastic API that lets you search by email and, at least at the time of this posting, doesn’t have firm rate limits. As I mentioned, in addition to returning someone’s GitHub profile, the API will provide you with a bunch of other useful info if it exists. Here’s an example response to give you an idea of some of the fields that are available:

{
    "id":"f8d0595a-af59-41e9-b991-1da5c1e7a60d",
    "slug":"alex-maccaw",
    "name":"Alex MacCaw",
    "first_name":"Alex",
    "last_name":"MacCaw",
    "email":"info@eribium.org",
    "url":"http://alexmaccaw.com",
    "score":82.2,
    "headline":"Ruby developer at Sourcing",
    "bio":"",
    "educations":[],
    "twitter":"maccaw",
    "github":"maccman",
    "linkedin":"/pub/alex-maccaw/78/929/ab5",
    "facebook":null,
    "languages":[
        "Ruby",
        "JavaScript",
        "CoffeeScript"
    ],
    "frameworks":[
        "Rails",
        "Node",
        "OSX",
        "Cocoa",
        "iOS"
    ],
    "location":"San Francisco, CA, USA",
    "company_name":"Sourcing",
    ...
}

And here are teh codez for getting someone’s title, homepage, LinkedIn, GitHub, and Twitter from their email:

import json
import urllib2

def call_sourcingio(email):
  """Calls Sourcing.io with an email
  Returns a user's title and relevant URLs, if any exist.
  Otherwise returns None """
  try:
    url = 'https://api.sourcing.io/v1/people/email/{0}'

    # TODO insert your API key here
    key = ''

    request = urllib2.Request(url.format(email))
    request.add_header('Authorization', 'Bearer {0}'.format(key))
    response_object = urllib2.urlopen(request)
    response = json.loads(response_object.read())
  except urllib2.HTTPError, err:
    if err.code == 404:
      return None
    else:
      raise err

  return { 
  'headline': response['headline'], \
  'linkedin': 'https://www.linkedin.com/{0}'.format(response['linkedin']), \
  'github': 'https://github.com/{0}'.format(response['github']), \
  'twitter': 'https://twitter.com/{0}'.format(response['twitter']), \
  'url': response['url'] 
  }

Going through your result set with less pain

Once you have your result set, you have to go through all your users’ GitHub accounts to figure out who’s worth contacting and what to say to them. In most cases, this result set will be pretty small because most of your users probably won’t be engineers, so going through it by hand isn’t the worst thing in the world. That said, there are still a few things you can do to make your life easier.

I ended up writing a script that iterated through every user who had a GitHub account and printed them to the terminal as it went. Each time a new user came up, my script would open a new browser tab with their GitHub account ready to go (and potentially other URLs). It would also copy the current user’s email address to the clipboard so that if I decided to contact them, I’d have one less thing to do. Note that to use Pyperclip (the thing that handles copying to the clipboard), you have to download it first.

import json
import webbrowser
import pyperclip

def parse_potential_engineers(results_filename):
  """Goes through potential engineers one by one"""
  json_data = open(results_filename)
  results = json.load(json_data)
  json_data.close()

  for email in results:
    if results[email] == None:
      continue

    print email
    print results[email]

    webbrowser.open(results[email]['github']) # open GitHub page in a new browser window
    pyperclip.copy(email) # copy email address to clipboard
    sys.stdin.readline()

Don’t be awful

As a final word of caution, I’ll say this. If you’re the one using these scripts, you’re either an engineer or have been one at some point, and you know how shitty it is to be inundated with soulless recruiter spam. These tactics give you the opportunity to find out enough about the people you’ll be emailing to write something meaningful. Talk about their projects and how those projects relate to what your company is doing. Talk about why someone would want to work for your company beyond just stupid perks or how well-funded you are. And if you find engineers who have great blogs or GitHubs but no pedigree, give them a chance. In other words, please use your newfound powers for good.

EDIT: After some good discussion, I think it’s worth pointing out that while a GitHub account isn’t a bad signal for whether someone is an engineer, the absence of one most certainly is NOT a signal that someone is not an engineer, and it’s certainly not a signal that someone is not a great one. Several of the best engineers I know simply don’t have side projects or don’t throw their side projects’ source code out into the world or aren’t involved in the open source community. In other words, I would hate for someone to interpret this post as validation that a GitHub account is a good gating mechanism for job applicants.

 

1Tools like Rapportive and Connectifier can make your life easier when it comes to getting additional context about people, but Connectifier isn’t cheap, and Rapportive requires toggling Gmail. You can try using Rapportive programmatically — Jordan Wright did build a pretty great Python wrapper around Rapportive’s undocumented API, and I was trying to use it initially to get more info about users — but that thing gets throttled real fast-like.

Looking for a job yourself? Work with a recruiter who’s a former engineer and can actually understand what you’re looking for. Drop me a line at aline@alinelerner.com.

One Response to “Hire from your user base”

  1. Roberto Martinez

    Aline,

    I like the title of your blogpost. It is attractive, the only problem I see is what you mention regarding a small user base. When you are a startup nobody knows you. At this stage you must master recruiting as a CEO or co-founder in any position at your company. Here is a good article that can help to start mastering this skill. http://nearsoft.com/blog/do-you-need-help-hiring-3-developers/

    BTW, the tools that you mentioned are also a great source, but kind of costly.

    Reply

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>