Learning about data science

Thomas Dickson

2 minute read



Over the past few years I’ve had the opportunity to read a range of textbooks on topics around Data Science and machine learning. I find the problem with reviewers is that it’s hard to judge how useful their opinion is unless you can see what they’ve already read. This page is a list of the opinions I have on various textbooks which cover various areas in Data Science.

I use Data Science as a sort of catch all term to describe anything to do with doing stats with a computer. Wikipedia has a good review of what it’s all about and provides a good place to start.

An Introduction to Statistical Learning

This textbook presents and applies a range of different statistical techniques to investigate various case studies. Each chapter has a problem section where the writers present a range of problems for you to solve, perhaps using the R programming language which is used throughout the book.

This textbook is a great introduction to the subject. I use this if I need a quick review of what techniques exist to solve different types of problem.

Elements of Statistical learning

Elements of Statistical Learning offers a more rigorous exposition on key statistical methods used to solve statistical problems. The theory behind the techniques introduced in ISL is covered in great detail. The questions are challenging - perhaps the sort of thing to keep you occupied through another lock-down. Detailed notes and solutions have been written by a pair of absolute heroes and can be found here.

I use this book if I need to understand the theory of a technique in more detail. I found the questions challenging and rewarding - I’d consider working through the questions in this book in more detail if I was preparing for a specialised Data Scientist role.

Linear Algebra and Learning from Data

This book is a tour de force through various areas in linear algebra, statistics and optimisation and some of their applications in the realm of Data Science. It’s a great undergrad level textbook as each aspect of an area has problems which gently tease out your understanding of an area. I’ve reviewed the book and course in another post.

I probably wouldn’t read this book again but I thought it was excellent for helping me cover the fundamentals.

Doing Data Science

For a while this book was my only in depth insight into what Data Science is actually like “in industry” (that famous phrase). The book provides an overview of the whole Data Science process and explains some of the key concepts. Various chapters describe how Data Science has shed insight into the construction of recommendation engines, fraud detection and epidemiology. The Chapter on Claudia Perlichs experiences competing in Data Science competitions and working in industry gives a great insight into how a successful data scientist operates.

This book has helped me a lot in my current job as it explains the decision making behind solving various problems. It pares excellently with the other books as it illustrates how mathematical techniques might be applied rather than shoving you out the door with a pile of equations.