6 Books Every Data Scientist Should Keep Nearby

The best way to stay in touch is to continue brushing up on your knowledge while also maintaining experience. It’s the perfect storm or combination of skills to help you succeed in the industry.

By Kayla Matthews, Productivity Bytes.


Machine learning and __data and the data science processes develop and produce accurate machine learning systems. While they are not necessarily synonymous, they are related, so it’s a great idea to brush up on your understanding and knowledge of machine learning if you work in the data industry.

Some insights you can learn from this excellent resource are how often you should be collecting training data, how to use end-to-end deep learning and how to facilitate the sharing of data and stats with a system you are creating.

Machine Learning Yearning | free

2. Hadoop: The Definitive Guide, by Tom White

Apache Hadoop is the primary framework used to process and manage large quantities of data. Anyone working in programming or data science will be familiar with the platform, because it’s necessary. In fact, it’s one of the most efficient ways to develop a scalable system.

It just so happens Tom White, an expert Hadoop consultant and Apache Software Foundation member, wrote the definitive guide, so it’s filled to the brim with insights and useful resources. More importantly, it will walk you through the entire process and setup of working with Hadoop clusters.

Apache Spark is another important platform you might want to spend time learning.

Hadoop: The Definitive Guide | $40+

3. Predictive Analytics, by Eric Siegel

Titled Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, this book explains in painstaking detail how you can take most forms of data and information, and turn them into actionable predictions or insights. The key is to help professionals better understand their audience. You will learn how to identify what products and services they buy, what locations they visit, what content resonates with them and much more.

It is the job of a data scientist to look at raw, unfiltered data and identify usable trends and patterns. This book will not only help you do that, but come up with the necessary predictive algorithms to improve future operations and processes. Consider it the Bible of predictive analytics.

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die | $16+

4. Storytelling With Data, by Kole Nussbaumer Knaflic

Storytelling With Data: A data Visualization Guide for Business Professionals is a crucial read for anyone in the industry, even those who may not be directly connected to the enterprise or business world. Why?

Simply put, the book deals with the organization and extraction of vast quantities of data. That means getting rid of excess and unclear data, improving data collection processes and coming up with relevant, practical data visualizations.

It’s the definitive guide to learning what you should do with all the useful data you collect and how to go about doing it. Many of the insights apply to tech in general, and would be helpful even for those outside the profession.

Storytelling With Data | $20+

5. Inflection Point, by Scott Stawski

Also titled How the Convergence of Cloud, Mobility, Apps and data Will Shape the Future of Business, this guide is necessary to understand the evolution of the current data analytics and cloud computing industries.

Of particular note is Stawski’s direct focus on raw data storage and mining systems, how they can be deployed and how they are in use in the real world.

More than just a theoretical guide, it reveals actual working systems and describes how you can adapt them to fit your business or enterprise.

The important part is that you come away from the book with a clear understanding of how to deploy these tools and platforms within your organization.

Inflection Point: How the Convergence of Cloud, Mobility, Apps and data Will Shape the Future of Business | $40+

6. An Introduction to Statistical Learning With Applications in R, by Gareth James et al.

Statistical learning and related methods are necessary to work in data science. This textbook is designed to help anyone and everyone, from an undergraduate to a Ph.D. student, understand the concepts.

Of course, it also offers a great selection of R labs and practices, with detailed explanations and walkthroughs. The idea is that you can use it as a direct resource while practicing data science, especially during the educational phase.

Plus, it’s a great resource to have around and look back at regularly. The concepts and information are practical for daily applications.

An Introduction to Statistical Learning With Applications in R | free

Bio: Kayla Matthews discusses technology and big data on publications like The Week, The data Center Journal and VentureBeat, and has been writing for more than five years. To read more posts from Kayla, subscribe to her blog Productivity Bytes.