From the Editor’s Bookshelf: My Favorite Titles for Data Science and Machine Learning

As a practicing data scientist, I’ve spent years building up my library of academic and practical resources that I routinely draw upon for helping me do my work. Although my library is vast, I have a select group of books that occupy a prominent position on my desk. I’ve been asked enough times about my “favorite titles” list, I thought I’d write this article for my readers. Hopefully you’ll appreciate and benefit from my favorite books as much as I have over the years. If you’d like to contribute others to the list, please feel free to make comments for this post.

The books in the list below appear in order of my own personal preference. When available, I’ve included links for you to download free PDF copies.

Machine Learning and Data Science

The Elements of Statistical Learning – Data Mining, Inference, and Prediction, 2nd Edition by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. This is my “go to” text when I’m researching the underlying theory behind my favorite machine learning algorithms. This text is known by many as “the Machine Learning Bible.” I affectionately refer to it as “ESL” and I often find myself picking up to refresh my knowledge of various theoretical areas of machine learning. The content is highly mathematical and at a graduate level. The authors are computer science professors at Stanford. Highly recommended! Download free PDF HERE.

An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. This text, which I call “ISL” is a simplified very of ESL – less breadth in statistical learning, and less math. The chapters are divided into theory sections, and code sections with actual examples in R. I really like this book since I use R pretty much exclusively for my work. Highly recommended if you use R. The authors Hastie and Tibshirani are the same for ESL. Download free PDF HERE.

Learning from Data: A Short Course by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. I like this text as another perspective for statistical learning. It is highly mathematical. Caltech offers a free online course including video lectures based on the book. Download free PDF HERE.

Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. I don’t usually get excited about a new book for the field in which I’ve been deeply involved for quite a long time, but a timely and useful new resource just came out that provided me much anticipation. I first learned of the book while attending last year’s GTC Conference hosted by AI powerhouse NVIDIA. “Deep Learning” by three luminaries in the field is destined to considered the AI-bible moving forward. The book is available for free as HTML HERE.

Applied Predictive Modeling by Max Kuhn, and Kjell Johnson. This is a very useful text on statistical learning especially if you use R. I was sold on this book after hearing a presentation by Dr. Kuhn at the useR! 2014 conference. Kuhn is Director of Nonclinical Statistics at Pfizer. The book focuses on use of the well-known caret R package, that Kuhn invented. The caret package (short for _C_lassification _A_nd _RE_gression _T_raining) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for: data splitting, pre-processing, feature selection, model tuning using resampling, variable importance estimation, as well as other functionality.

Statistical Learning with Sparsity – The Lasso and Generalizations, by Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Sparsity is the central theme of the book. In general, a sparse statistical model is one in which only a relatively small number of predictors play an important role and therefore it is much easier to estimate and interpret than a dense model. This book presents methods that exploit sparsity to help recover the underlying signal in the data set. In summary, the advantages of sparsity are interpretation of the fitted model and computation convenience. A third advantage has recently emerged from some deep mathematical analyses in this field of research. This has become known as the “bet on sparsity principle:Download free PDF HERE.

Machine Learning for Hackers, by by Drew Conway and John Myles. What this book delivers is the authors – Drew Conway and John Myles White – and their wisdom for how to work with data, how to explore data, the steps required for manipulating the data, and finally the understanding for how to use machine learning methods. For each topic, they open up with a discussion of the problem domain, the tools to be used, and sometimes a playful example. But then they go through a substantive example, and the narrative in the text is where they shine.


There are many books that can enhance you abilities as a data scientist (and I probably own most of them), but the following few titles have served me best over the years. I have found that nearly all the mathematical techniques found in the machine learning books mentioned above are included in these texts.

Introduction to Linear Algebra, by Gilbert Strang. I have a large section of mathematics books including several on the subject of linear algebra. For many years my “go to” text on linear algebra was an old 2nd edition of MIT Professor Gilbert Strang’s seminal book on the subject that I picked up at a swap meet. To my surprise, the good professor sent me a copy of his latest and greatest 5th edition of “Introduction to Linear Algebra” (Wellesley-Cambridge Press).

Calculus Volume I, 2nd Edition by Tom Apostol. You’ll need a Calculus textbook of proven quality to help you fully understand the mathematics behind machine learning. The two volume series “Calculus” by revered Caltech mathematics professor Tom M. Apostol is the best choice in my opinion. I first learned of these books as a computer science/mathematics undergrad at UCLA where they were used in the honors Calculus courses. Later, I found that Volume I was the required book for freshman Calculus at Caltech. Volume I has everything you’ll need to get up to speed with limits, derivatives, integrals, and differential equations. It also includes a nice introduction to linear algebra basics. MIT Open Courseware offers a beginning Calculus class that uses this book as its text. Downloaded free PDF HERE.

Calculus Volume II, 2nd Edition by Tom Apostol. A continuation of the topics found in Volume I, this text includes useful topics such as: more on linear algebra, determinants, eigenvalues and eigenvectors, more on differential equations, Calculus of probability theory, and an introduction to numerical analysis. Download free PDF HERE.

Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist for insideBIGDATA. In addition to being a tech journalist, Daniel also is a practicing data scientist, author, educator and sits on a number of advisory boards for various start-up companies. 

Sign up for the free insideBIGDATA newsletter.