By Harrison Jansma.
There are simpler alternatives that offer to sort the mess for you.
Sites like Dataquest, DataCamp, and Udacity all offer to teach you data science skills. Each creating an education program that shepherds you from topic to topic. Each requires little course-planning on your part.
The problem? They cost too much, they don’t teach you how to apply concepts in a job setting, and they prevent you from exploring your own interests and passions.
There are free alternatives like edX and coursera which offer one-off courses diving into specific topics. If you learn well from videos or a classroom setting, these are excellent ways to learn data science.
If you learn well from reading, look at the Data Science From Scratch book. This textbook is a full learning plan that can be supplemented with online resources. You can find the full book online in pdf form(free), or get a physical copy from Amazon ($27).
These are just a few of the free resources that provide a detailed learning path for data science. There are many more.
To better understand the skills you need to acquire on your educational journey, in the next section I detail a broader curriculum guideline. This is intended to be high-level, and not just a list of courses to take or books to read.
A Curriculum Guideline
Programming is a fundamental skill of data scientists. Get comfortable with the syntax of Python. Understand how to run a python program many different ways. (Jupyter notebook vs. command line vs IDE)
I took about a month to review the Python docs, the Hitchhiker’s Guide to Python, and coding challenges on CodeSignal.
Hint: Keep an ear out for common problem-solving techniques used by programmers.(pronounced “algorithms”)
Statistics & Linear Algebra
A prerequisite for machine learning and data analysis. If you already have a solid understanding spend a week or two brushing up on key concepts.
Focus especially hard on descriptive statistics. Being able to understand a data set is a skill worth its weight in gold.
Numpy, Pandas, & Matplotlib
Learn how to load, manipulate, and visualize data. Mastery of these libraries will be crucial to your personal projects.
Quick hint: Don’t feel like you have to memorize every method or function name, that comes with practice. If you forget, Google it.
Check out the Pandas Docs, Numpy Docs, and Matplotlib Tutorials. There are better resources out there, but these are what I used.
Remember, the only way you will learn these libraries is by using them!
Learn the theory and application of machine learning algorithms. Then apply the concepts you learn to real-world data that you care about.
Most beginners start by working with toy data-sets from the UCI ML Repository. Play around with the data and go through guided ML tutorials.
The Scikit-learn documentation has excellent tutorials on the application of common algorithms. I also found this podcast to be a great (and free) educational resource behind the theory of ML. You can listen to it on your commute or while working out.
Getting a job means being able to take real-world data and turn it into action.
To do this you will need to learn how to use a business’ computational resources to get, transform, and process data.
However, database manipulation is a required skill set. You can learn how to manipulate databases with code on ModeAnalytics or Codecademy. You can also implement your own database (cheaply) on DigitalOcean.
Another (often) required skill is version control. You can acquire this skill easily by creating a GitHub account and using the command line to commit your code daily.
When considering what other technologies to learn, it is important to think about your interests and passions. For example, if you are interested in web development, then look into the tools used by companies in that industry.
Advice for executing your curriculum.
1. Concepts will come at you faster than you can learn them.
There are literally thousands of web pages and forums explaining the use of common data science tools. Because of this, it is very easy to get side-tracked while learning online.
When you start researching a topic you need to hold your goal in mind. If you don’t, you risk getting caught up in whatever catchy link draws your eye.
The solution, get a good storage system to save interesting web-resources. This way you can save material for later, and focus on the topic that is relevant to you at the moment.
Warning, your reading list will quickly grow into the hundreds as you explore new topics that interest you. Don’t worry, this leads us to my second piece of advice.
2. Don’t stress. Its a marathon, not a sprint.
Having a self-driven education can often feel like trying to read a never-ending library of knowledge.
If you’re going to be successful in data science you need to think of your education as a lifelong process.
Just remember, the process of learning is its own reward.
Throughout your educational journey, you will explore your interests and discover more about what drives you. The more you learn about yourself, the more enjoyment you will get out of learning.
3. Learn -> Apply -> Repeat
Don’t settle for just learning a concept and then moving to the next thing. The process of learning doesn’t stop until you can apply a concept to the real world.
4. Build a portfolio, it shows others they can trust you.
When it comes down to it, skepticism is one of the biggest adversities you will face when learning data science.
This may come from others, or it may come from yourself.
Your portfolio is your way of showing the world that you are capable and confident in your own skills.
Because of this, building a portfolio is the single most important thing you can do while studying data science. A good portfolio can land you a job and make you a more confident data scientist.
Fill your portfolio with projects that you are proud of.
Did you build your own web app from scratch? Did you make your own IMDB database? Have you written an interesting data analysis of healthcare data?
Put it in your portfolio.
Just make sure write-ups are readable, the code is well documented, and the portfolio itself looks good.
Here is an aesthetically pleasing, yet simple, GitHub portfolio. For a more advanced portfolio, look into GitHub-IO to host your own free website. (example)
5. data Science + _______ = A Passionate Career
Fill in the blank.
Data science is a set of tools intended to make a change in the world. Some data scientists build computer vision systems to diagnose medical images, others traverse billions of data entries to find patterns in website user preferences.
The applications of data science are endless, that’s why it is important to find what applications excite you.
If you find topics that you are passionate about, you will be more willing to put in the work to make a great project. This leads to my favorite piece of advice in this article.
When you are learning, keep your eyes open for projects or ideas that excite you.
Once you find an industry that you are passionate about, make it your goal to acquire the skills and technical expertise needed in that business.
If you can do this, you will be primed to turn your hard work and dedication for learning into a passionate and successful career.
If you love making discoveries about the world. If you are fascinated by artificial intelligence. Then you can break into the data science industry no matter what your situation is.
It won’t be easy.
To motivate your own education you will need perseverance and discipline. But if you are the type of person who can push yourself to improve, you are more than capable of mastering these skills on your own.
After all, that’s what being a data scientist is all about. Being curious, self-driven, and passionate about finding answers.
Follow me if you want more high-quality data science articles.
Original. Reposted with permission.
Bio: Harrison Jansma is a self-taught data scientist. Over the last 9 months, Harrison left hisjob, started studying machine learning full time, and enrolled in a Master's Program in Computer Science. Harrison is doing all of this because his passion and goal is implementing machine learning applications in the real world. This means a strong understanding of predictive modeling and production environments.