To get hired as a data scientist, don’t follow the herd
Scroll down for the audio version of this post
I still remember the moment my brother decided to sell his bitcoin. It was 2017, and we were at a Starbucks. We were approached by a middle-aged woman who was giving away pamphlets to anyone who would take one. “BITCOIN: a path to early retirement” was written in bold font at the top.
I was curious, so I asked her what she thought of the cryptocurrency market more generally, but it turned out that she knew of no other cryptocurrency besides bitcoin. Ethereum? “Never heard of it.” Litecoin? “That’s the cheap version of bitcoin, right?”
Now, as a rule of thumb, when even the clueless middle aged lady at the local Starbucks is pitching you on the latest tech trend, you’re probably approaching peak hype. Or, if you prefer, a “bubble”.
This isn’t a new observation, of course. Everyone agrees that when it comes to investing, if you’re doing what everyone else is doing, you’re unlikely to see any returns. What’s weird though, is that people fail to apply this same reasoning when it comes to investing in themselves.
Let’s suppose you want to get hired as a data scientist. If you’re doing all of the standard “I want to become a data scientist” things, then this means you shouldn’t expect to land your dream job. The market is currently full of junior talent, and as a result, the median aspiring data scientist is unlikely to get much traction. So if you want to avoid the median outcome, why do median things?
The problem is, most people don’t think this way when they embark on their data science journeys. I’ve spoken to literally hundreds of aspiring data scientists through my work at SharpestMinds, and about 80% of them have roughly the same story to tell:
- First, they learn the ropes (Python + sklearn + Pandas + maybe some SQL or something)
- Then, they take a cookie-cutter MOOC of some sort
- They read a few job descriptions, get worried that they don’t have what it takes
- Maybe take another MOOC, maybe start applying to jobs through a jobs board
- Hear nothing back (or at best, bomb a few interviews)
- Get frustrated, think about doing a Master’s, apply to some more jobs
- Come to a decision point: do I repeat #2 through #7 until something different happens?
If this ever happens to you, odds are you’re in a self-improvement bubble too: you’re doing what everyone else is doing, but expecting a different outcome. The very first thing you need to do is stop.
If you want above-average outcomes, you can’t do average things. But to avoid doing average things, you need to know what the average things are.
Here are some examples: if you needed to do a MOOC to learn the ropes, that’s fine. But don’t get stuck in a MOOC spiral: MOOCs are, almost by definition, designed for the average person, so you won’t become an outstanding candidate by doing more of them. Likewise, if you have 4 or 5 Jupyter notebooks featuring the same boring sklearn/Pandas/seaborn/Keras stack on your GitHub, do not make another one.
Overall, the rule is: if something seems like an obvious next step because everyone else is doing it, that’s a great thing to not do. And conversely, you need to find the things that no one else is doing, and do those things as soon as possible.
What are those things? Based on what I’ve seen, about 5 come to mind:
- Replicate papers. This is especially true if you’re a deep learning buff. People don’t do this because it’s harder than grabbing a dataset and using a simple ANN or XGBoost to do cookie-cutter classification. Find the most interesting paper (ideally a relatively recent one) relevant to your field on the arXiv, and read it. Understand it. Then, replicate it, potentially on a new dataset. Write a blog post about it.
- Don’t get comfortable in your comfort zone. If you start a new project, it had better be to learn some new frameworks/libraries/tools. If you’re building your 6th Jupyter notebook that starts with
df = pd.read_csv(filename)and ends with
f1 = f1_score(y_true, y_pred), it’s time to change your strategy.
- Learn boring things. Other people aren’t doing this because no one likes boring things. But learning a proper Git flow, how to use Docker, how to build an app using Flask, and how to deploy models on AWS or Google Cloud, are skills that companies desperately want applicants to have, but that are under-appreciated by a solid majority of applicants.
- Do annoying things. 1) Offer to present a paper at a local data science meetup. Or, at the very least, attend the local data science meetup. 2) Send cold messages to people on LinkedIn. Try to offer value upfront (“I just noticed a typo on your website”). DO NOT ASK THEM FOR A JOB RIGHT AWAY. Make your ask as specific as possible (“I’d love to get your feedback on my blog post”). You’re trying to build a relationship and expand your network, and that takes patience. 3) Attend conferences and network. 4) Start a study group.
- Do things that seem crazy. Everyone goes to the UCI repository, or uses some stock dataset (yawn) to build their project. Don’t do that. Learn how to use a web scraping library, or some under-appreciated API to build your own, custom dataset. Data is hard to come by, and companies often need to rely on their engineers to get it for them. Your goal should be to come across as the kind of data science-obsessed lunatic who will build your own goddamn dataset if that’s what it takes to get the job done.
Each of these strategies is another way to stand above the noise that recruiters face every day. None of them are silver bullets, but they are surefire ways to get more traction on the data science job market, and become a more capable data scientist.
At the end of the day, remember that when you’re building your skills, you’re investing in yourself. And that means that all the same economic principles that apply to investment apply here: if you want an outstanding outcome, you have to do outstanding things.
If you have questions about how to optimize your machine learning or data science trajectory, I’m always happy to chat. Just send me a DM on Twitter at @jeremiecharris :)