By Karlijn Willems, Data Science Journalist & DataCamp Contributor.
One of the most popular questions in the data science field, apart from ‘What is data science?’, is ‘How do I learn data science?’. It’s not just a question that comes from those who are new to data science, but also from those who have already been around for some time. The road to the “sexiest job of the 21st Century” or the “best job of the year” 2016 is clearly not as smooth or straightforward as one would think.
At DataCamp, our students learn data science by doing. But we have also noticed that they continue to ask these questions. You can find a lot of opinions and advice from seasoned veterans on the Internet, but this jungle of information is not making things easier for beginners. This post is meant to present a general overview of the eight steps that you need to go through to learn data science.
The goal is not to give an exhaustive list, but rather to make this a guide for everyone that is interested in learning data science or for everyone that has already become a data scientist or part of a data science team but wants some additional resources for further perfection.
If you prefer a visual representation of this blog post, make sure to check out the corresponding infographic “Learn Data Science – 8 (Easy) Steps”.
What Is Data Science?
Data science is still a fuzzy concept. There are and have been many definitions or attempts at definitions around, and it doesn’t need to surprise that some of these have been visually represented. The most significant start of this trend or tradition was in 2010, when Drew Conway presented a Venn diagram to define the concept “data science”. In the center of the picture is data science and it is the result of the combination of hacking skills, mathematics and statistics knowledge and substantive expertise.
Over the years, there have been many Venn diagrams or other visual representations that circulated throughout the data science industry, one more successful than the other. For a chronological overview of the most significant ones, check out the article Battle of the Data Science Venn Diagrams.
To make a long story short, in 2016 we got a slightly different image of what data science is. Matthew Mayo blogged a visual representation made by Gregory Piatetsky-Shapiro. There are a lot of things that are different. Two things that stand out are the fact that data science is no longer in the center of the picture and that the approach to defining data science is different. Data science is now defined through its relation to other disciplines, such as Artificial Intelligence (AI), Machine Learning (ML), Deep Learning, Big Data (BD) and Data Mining (DM). Data science is at the crossing of AI, ML and BD and has an intrinsic relation with DM, as it is considered the superset of data mining and its successor term.
These two visuals might seem completely different, but they do share a lot of similarities: the disciplines that are visualized in Piatetsky-Shapiro’s picture all require hacking skills, mathematics and statistics knowledge and substantive expertise or domain knowledge.
Data Scientist’s Educational Background
There have been a lot of surveys over the past few years on the educational background of data scientists. As a result, there have also been many different results. In the O’Reilly Data Science Salary Survey of 2014, about 28% of the respondents had a Bachelor’s degree, while 44% had a Master’s degree and 20% had a Ph.D. Common fields that data scientists have as backgrounds are mathematics/Statistics, Computer Sciences, and Engineering. The results that are represented in the infographic are from 2016. They are very similar to the ones of the O’Reilly survey.
In general, you could conclude that the degree that you need to have completed to become a data scientist is usually a Master’s degree or Ph.D. The field that you come from is of less importance, but you have an advantage if you have a quantitative background.
Step 1. Get Good at Stats, Maths and Machine Learning
The perspective on the definition of data science might have changed over the years, but data science has remained a somewhat technical occupation. A sound knowledge of statistics, mathematics, and machine learning are still considered a main requirement for anyone to do data science.
Getting up to speed with these three can be a pain, especially for those who have no technical background whatsoever. Luckily, you have more than enough qualitative resources to help you out on this: Khan Academy offers online courses on a variety of mathematics topics that will undoubtedly be of great value to you, but make sure to also take a look at the Linear Algebra course from MIT Open Courseware. For statistics, DataCamp, Udacity and OpenIntro’s material might help you, and for Machine Learning, you should keep an eye out for the content on DataCamp, Stanford Online and Coursera.
Step 2. Learn to Code
Developing your hacking skills is also one of the things that you need to take into account still if you want to learn data science.
You can start by getting familiar with the computer science fundamentals: get to know the basic data structures and search algorithms. Then, step up to understanding how end-to-end development works: the stuff you will work on will be integrated with other systems, so it’s best to understand how development from beginning to end, from the requirements gathering and analysis to testing and maintaining code. When you have grasped this concept, it’s time to pick a language. You can go for an open source language or a commercial one. Things to take account in your decision are the learning curve, the industry you want to work in, the salary that comes with being proficient in the language, …
Make your choice easier with the help of this infographic. DataCamp is there to assist you if you have made chosen an open source programming language.