With the amount of available data doubling every 18 months and AI having advanced to the point where it’s better able to predict your key personality traits than your own mother, it’s no surprise that human beings are struggling to keep up.
Given the massive advances in technology over the past decade or so, there simply aren’t enough qualified data scientists to meet the demands of an increasingly analytical business sector. Unlike more recognized disciplines like Applied Mathematics or Computer Science, Data Science remains a bit of an enigma to many, requiring specialist expertise and understanding that few current qualifications offer.
As such, the world’s top data scientists tend to be swiftly snapped up by the big players like Amazon and Facebook, leaving other businesses to make do with what they can find. Not only that, but the majority of CTOs and those in charge of overseeing their data science forces don’t themselves have the necessary skills to assess the effectiveness of their recruits’ efforts, making it doubly difficult to achieve optimal business outcomes.
So with skills in short supply and expectations hard to establish, how do you know whether your data scientist is the real deal? Here are 5 common excuses that should raise the red flag, and cause you to question the quality of your hire:
The dimensionality of data is too high
Many aspiring data scientists regularly complain that they’re unable to draw conclusions due to excessively high dimensionality of data. Simply put, they’re suggesting that an excess of variables and a dearth of equations renders their task impossible. For those in the know, the simple problem here is that their model is poorly conceived, as your data scientist should be choosing their own dimensionality, and add more or fewer dimensions as needed in order to better predict the data. If they don’t understand this basic tenet of the job, it might be time to give them their marching orders.
Not enough processing power
The processing power required to solve a predictive model ultimately depends on the way it’s framed by the data scientist. So for instance, should they be trying to predict the next movie a consumer is likely to watch, and create an equation that has every movie in existence as output, this would likely require a fair amount of computing power. However, not only is such a model unnecessary, but it’s also unlikely to be effective, as it’s attempting to answer too much, too quickly. By reducing the model to a yes/no output for each movie and then choosing appropriate sample sizes, data scientists can re-frame the model to be readily solved with very little computing power. Essentially, for 95% of businesses no effective equation should require more than a laptop to solve for a single iteration of the data – be very worried if it does.
The data isn’t clean enough
This is an excuse encountered all too often, and while it can at times be valid, it should ring some alarm bells if you hear it more than once. Of course data, like anything else, if inconsistently measured, is of little value to anyone. However, it’s the data scientist’s job to clean the data, isolating inconsistencies and removing duplicates or anomalies. Sometimes, it can be as simple as words being written out differently (iPhone vs. iPhône for example), something that can and should be simply solved by any data scientist worth their salary.
The data is too complex
In cases where your data scientist makes this particular complaint, the likelier truth of the matter is that they’re trying to fit a model to data that’s inappropriate. For instance, they might be trying to perform user item collaborative filtering (or identifying similarities) with data points that are categorical, like yes/no answers – a task that will ultimately bear very little fruit. Your data scientist shouldn’t be trying to force data to perform within certain parameters, but rather choose an appropriate model to begin with..
A commonly experienced problem in the data science industry is the abundance of lengthy delays in delivering projects. Of course, given that very few CTOs have an understanding of how long a task should take, this type of vicious cycle can end up being very detrimental to business. Simply put, if a project doesn’t have at least a temporary implementation in 4-6 weeks, it’s unlikely to ever be completed. Remember, you’re paying your data scientist to do a job, not to learn it as they go. So don’t settle for slow turnaround times, as it either means your recruit is slacking or woefully out of their depth.
Sign up for the free insideBIGDATA newsletter.