How many data scientists are there and is there a shortage?

We examine the famous McKinsey prediction from 2011 and look into whether there a shortage of people with analytical expertise and estimate how many Data Scientists are there.
By Gregory Piatetsky, KDnuggets.
c
comments

(this blog was jointly written with Preet Gandhi, NYU)

Data Scientist

The 2011 McKinsey report on Big Data said that “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data.”

In 2014, we examined "How Many Data Scientists are out there?" and came with an estimate of 50-100,000, and did not see much evidence of a massive shortage then. In 2014, we found only about 1,000 job ads for "Data Scientist" on indeed.com. 
In 2016, we examined Deloitte study that predicted Businesses Will Need One Million Data Scientists by 2018.

Now that we reached 2018, we can examine how accurate were those predictions  and try to answer three questions:

  1. Is there a shortage of Data Scientists now?
  2. How many "Data Scientists" are there , both in name and in function ?
  3. What are the future prospects for Data Scientists?

1. Is there a shortage of Data Scientists?

The answer to the first question appears to be yes.

LinkedIn Workforce Report for US (August 2018) says

Demand for data scientists is off the charts  ... data science skills shortages are present in almost every large U.S. city. Nationally, we have a shortage of 151,717 people with data science skills, with particularly acute shortages in New York City (34,032 people), the San Francisco Bay Area (31,798 people), and Los Angeles (12,251 people).

Note that LinkedIn reports shortages for people with "Data Science Skills", not necessarily people with "Data Scientist" title.

We can estimate the demand for "Data Scientists" from two popular job search sites - indeed and Glassdoor.

Search on indeed.com  for “data scientist” (in quotes) in USA finds only about 4,800 jobs.

Note: using quotes are important for searches on indeed. Search for data scientist without quotes finds about 30,000 jobs, but we are not sure how many of those jobs are for scientists in other areas.

US is the largest but not the only market for Data Scientists. We can also see strong demand for Data Scientists elsewhere, for example by checking regional indeed sites (indeed.co.uk, indeed.fr, indeed.de, indeed.co.in, etc)

  • UK: 1,100 jobs
  • France: 718 jobs
  • Germany: 900 jobs
  • India: 500 jobs

Glassdoor search for "Data Scientist" finds about 26000 jobs in USA (same results if quotes are removed).

2. How many "Data Scientists" are there?

Google search defines a data scientist as

“a person employed to analyze and interpret complex digital data, such as the usage statistics of a website, especially in order to assist a business in its decision-making.”

There are many people in the industry and academia who do this work without having the formal title of a data scientist, since Data Science is an interdisciplinary field at the intersection of Statistics, Computer Science, Machine Learning, and Business. We can estimate the current population of Data Scientist by examining popular data science platforms.

Kaggle (now part of Google) is a platform for data science  and analytics competitions. It claims to be the world’s largest community of active data scientists.  While not all Data Scientists take part in Kaggle competitions or have a Kaggle account, and not all Kagglers do work of data science, it is reasonable to assume a large overlap. In June 2017, the Kaggle community crossed 1 million members, and Kaggle email on Sep 19, 2018 says they surpassed 2 million members in August 2018. Since not all Kaggle members are active, Kaggle membership is probably a global upper bound for people engaged in data science.

KDnuggets is now reaching over 500,000 unique visitors per month, and given our focus on helping Data Scientists and Machine Learning Engineers to do their job better, we think  it is also a reasonable estimate that the majority of our visitors work in Data Science / Machine Learning area, regardless of their job title. While visitors may stumble on KDnuggets randomly, we can look at subscribers / followers - a more active subset.

KDnuggets now has about 240,000 subscribers/followers over Twitter, LinkedIn, Facebook, RSS, and email, and while there is some overlap, about 200,000 seems a reasonable low bound for a number of Data Scientists globally.

On LinkedIn, there are many groups dedicated to data science, and although the engagement in those groups has been falling, we can use their membership as a rough estimate. Here are three of the largest groups

  • Big Data and Analytics  -  339,000
  • Data Science Central - 278,000
  • Data Mining, Statistics, Big Data, Data Visualization, and Data Science - 170,000

Examining the titles of members, we see great diversity. The titles include Data Scientist, Data Analyst , Statistician, Bioinformatician, Neuroscientist, Marketing executive, Computer scientist, etc... It is safe to say that any person who does the tasks that a conventional data scientist does can be considered in this category. With the growing need to analyze data to derive insights or make key decisions, people with traditionally different job titles and responsibilities are keen to learn new techniques of data analysis to suit their domains. This doesn’t make them a data scientist primarily but they do possess that knowledge and talent of the field.

We can also get useful information from LinkedIn profile of Data Scientists, which shows over 100,000 people with this title.

Linkedin Data Scientist profile


Fig. 1: LinkedIn Data Scientist profile, by industry and by location.

Searching LinkedIn for “data scientist”  (quotes are important) we find over 100,000 people with that actual title.  So if globally between 200,000 and 1,000,000 people are doing some Data Science related work, then a majority of them does not have a Data Scientist title.

We can also estimate the size of larger data analysis/visualization/statistics community by looking at activities related to languages and platforms most connected to Data Science: R, Python, Machine Learning libraries, Spark, and Jupyter. Apache Spark Meetups had 225K members recently and growing every month. Intel Capital estimated that there 1 million R programmers worldwide. Based on the public data on python.org website, there have been around 2.75 million downloads. Jupyter project has around 3 million users at present. These numbers can give us a rough upper limit on the number of data analysts/data scientists around the world.

3. Future Prospects for Data Scientists

The near-term future for Data Scientists looks bright.

LinkedIn 2017 emerging jobs report claims that machine learning engineers working today has increased by 9.8 times as compared to 5 years ago. Machine Learning Engineers, Data Scientists, and Big Data Engineers rank among the top emerging jobs on LinkedIn. Data scientist roles have grown over 650% since 2012.

LinkedIn Top 10 Emerging Jobs, 2017


Fig. 2: Top 10 emerging jobs on LinkedIn and their growth from 2012 to 2017.

Job growth in the next decade is expected to outstrip growth during the previous decade, creating 11.5M jobs in the Data Science/Analytics area by 2026, according to the U.S. Bureau of Labor Statistics.

Data Science Analytics Landscape

IBM recently claimed that by 2020 the number of Data Science and Analytics job listings is projected to grow by nearly 364,000 listings to approximately 2,720,000. No matter what the true number of data professionals out there currently, their number is likely to grow in the near future.

Long-term, however, automation will be replacing many jobs in the industry, and Data Scientist job will not be an exception.  Already today companies like DataRobot and H2O offer automated solutions to Data Science problems.

Respondents to KDnuggets  2015 Poll  expected that most expert-level Predictive Analytics/Data Science tasks  will be automated by 2025.  To stay employed, Data Scientists should focus on developing skills that are harder to automate, like business understanding, explanation, and story telling.

Related:

  • Machine Learning Engineer, Data Scientist – top US emerging jobs
  • Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science in 2016
  • Data Mining (and Statistical Analysis) is LinkedIn Hottest Skill in 2014
  • How Many Data Scientists are out there?
  • Data Scientists Automated and Unemployed by 2025?
  • Businesses Will Need One Million Data Scientists by 2018