What does a data scientist REALLY look like?

Using the responses from Stack Overflow's 2018 Annual Developer Survey, we attempt to build a portrait of comments

By Genevieve Hayes, Stitchdata.

Six years ago, the Harvard Business Review named

Figure 1: Comparison of gender (left) and age (right) distributions for data scientists (DS) vs non-data scientists (Non_DS)

As can be seen from Figure 1, the age and gender distributions of data scientist and non-data scientist respondents are almost identical. The average age of both data scientists and non-data scientists is 30.5 years, and 91% of data scientists are male, compared to 92% of non-data scientists.

This suggests that, rather than attracting individuals from new demographics to computing and technology, the growth of data science jobs has merely creating a new career path for those who were likely to become developers anyway.

Yet, comparing the educational backgrounds of data scientists and non-data scientists does reveal one key difference between these two groups.

Figure 2: Comparison of highest degree level distributions for data scientists (DS) vs non-data scientists (Non_DS)

Figure 2 shows that, even though, contrary to popular belief, it is possible to become a data scientist without a master's or Ph.D., data scientists are much more likely to hold an advanced degree than non-data scientists, with 45% of the data scientist respondents holding a master's or a Ph.D., compared to 23% of the non-data scientists.

This suggests a difference in skills required for data science and non-data science developer roles, with data science roles more likely to require skills that are taught as part of advanced degree programs.

Part 2: How do the coding skills differ between data scientists and non-data scientists?

Given the higher academic requirements employers place on data scientist roles, this raises the question: Do employers also require greater coding experience of their data scientists compared to their non-data scientists?

Figure 3 shows that the opposite is, in fact, true.

Figure 3: Comparison of the distribution of professional coding experience for data scientists (DS) vs non-data scientists (Non_DS)

Data scientists typically have fewer years professional coding experience than non-data scientist developers, with 62% of the data scientist respondents having five or fewer years of professional coding experience, compared to 57% of non-data scientists.

This suggests that, rather than demanding more of data scientists in all respects, in developer roles, there exists a trade-off between coding skills and the sorts of technical skills that are taught in universities.

Yet, not all programming languages are created equal, and the programming languages data scientists and non-data scientists use in their day-to-day jobs are not necessarily the same.

Data scientists are more likely to use languages designed for, or with libraries for, statistical modelling and analysis, such as Python or R, while non-data scientists are more likely to program in languages associated with web development activities, such as HTML, CSS, and JavaScript.

For example, 77% of data scientists report having programmed in Python in the past year, compared to 35% of non-data scientists, while 72% of non-data scientists report having programmed in JavaScript in the past year, compared to 55% of data scientists.

This reflects the differences in the types of tasks commonly performed by data scientists, who typically focus on using statistics and modeling techniques to derive insights from data, versus non-data scientists, who are more likely to be involved in software engineering or web development-type activities.

Part 3: Are data scientists more satisfied with their careers than non-data scientists?

If data scientist really is the best job to be in right now, then we would expect data scientists to be more satisfied than non-data scientists with both their jobs and their careers in general. And this is exactly what we observe from the data.

However, even though data scientists do tend to be more satisfied with both their jobs and their careers than non-data scientists, both groups tend to enjoy high levels of satisfaction in their jobs and careers.

Figure 4 shows that 73% of data scientists and 70% of non-data scientists are at least slightly satisfied with their jobs, while 74% of data scientists and 73% of non-data scientists are at least slightly satisfied with their careers.

Figure 4: Comparison of the job satisfaction (left) and career satisfaction (right) distributions for data scientists (DS) vs non-data scientists (Non_DS)

Therefore, even if a career in data science is not for you, any development-related role is likely to lead to levels of job and career satisfaction similar to those of the "best job in the US."

Conclusion

After exploring what it takes to land a job as a data scientist, and how this differs from landing a non-data scientist developer role, as well as comparing the levels of job and career satisfaction of people in these two groups, we found:

  1. Although data scientists and non-data scientists tended to come from similar demographic backgrounds (that is, predominantly, young males), data scientists were more likely to have an advanced degree than non-data scientists but tended to have less professional coding experience.
  2. Data scientists were more likely to make use of statistical and modeling-focused programming languages, such as Python and R, than their non-data scientist counterparts, who tend to favor web development-focused languages, such as HTML, CSS, and JavaScript.
  3. Even though data scientists enjoy higher levels of job and career satisfaction than non-data scientists, both groups tend to be highly satisfied with their jobs and their careers.

Putting this all together, it seems that a typical data scientist is, therefore, the stereotypical nerdy male programmer: a male in his early 30s with an advanced degree and some professional experience programming in languages such as Python or R.

However, just because this is what a "typical" data scientist looks like now, this does not mean that this is what one will look like in the future. In fact, for the sake of the global economy, this image will have to change.

As mentioned previously, data science is a fast-growing profession where demand consistently outstrips supply and is expected to do so for many years to come.

The best way to meet this demand is for employers to look for ways to attract individuals from demographic groups that have traditionally been underrepresented in computer science and technology to this profession.

If you don't see yourself as fitting the "typical" data scientist mold, therefore, my advice is: don't be discouraged.

There is plenty of room in the data science profession for people of all backgrounds, and based on the levels of job and career satisfaction enjoyed by data scientists, the effort involved in developing the skills necessary to gain a data science role is well worth it.

After all, who wouldn't want to work in the "sexiest job of the 21st century"?

To learn more about this analysis, visit the GitHub repository for this project.

Bio: Genevieve Hayes is a data scientist and actuary from Melbourne, Australia. She has held positions in the insurance, education and government sector.

Original. Reposted with permission.

Related:

  • Select Your Analytics Adventure – Analytics On-boarding
  • 10 Best Mobile Apps for data Scientist / data Analysts
  • Why do I Call Myself a data Scientist?