Best Sellers Rank: #16,031 in Books (See Top 100 in Books) #2 in Mathematical Analysis (Books) #3 in Programming Algorithms #15 in Mathematics Study & Teaching (Books)
Customer Reviews: 4.5 out of 5 stars 50Reviews
Product Information
From the Publisher
What was your motivation for writing this book?
When I started in machine learning, I saw most of my colleagues struggle with the basic mathematics behind neural network models. Even though they were great engineers, they made basic modeling mistakes, like using the mean-squared error for classification, having trouble fixing tensor shape mismatch errors, and similar things.
So, I started to educate them about fundamental mathematical topics, like how matrix multiplication works or what the mean-squared error and cross-entropy loss are. Soon, I realized that these intuitive explanations were also missing from the machine learning literature: most books either completely ignored the mathematics behind machine learning or were way too advanced to be useful.
Because of this, I started posting my explanations as Twitter threads, and they instantly became a smash hit. Soon, I had enough material for a book, and four years later, here we are with a ~770-page tome aiming to bring mathematics closer to machine learning professionals.
Why should ML professionals learn about the underlying mathematics?
Because mathematics is the language of machine learning!
Advanced machine learning libraries like PyTorch or scikit-learn hide most of the complexities from us; training a model can be as simple as calling the .fit method.
On the one hand, this is great! Abstracting the technical details away from the user makes it possible to build cutting-edge solutions in a matter of days. Lowering the barrier of entry is one of the reasons behind the explosion of AI we see today.
On the other hand, if one has to go beyond the ready-made solutions, a user-level understanding is not enough. To beat the state-of-the-art or build something novel, you can’t get away without looking under the hood. I own an old Volvo, and even though I love that car, I’m a frequent client at the car shop. My mechanic is really great at diagnosing rare problems. He can tell precisely what’s wrong by driving the car a couple of laps around the block.
Fluency in mathematics has a similar effect in machine learning.
How is this book different from other machine learning books?
Most books are either written from a purely practical or a purely theoretical perspective. In my experience, the learning curve is extremely steep for most theory-focused books. Even with a PhD in mathematics, there are textbooks where I get stuck in the first chapter, leaving the rest of the book inaccessible to me. I wanted to change that. So, I had three guiding principles in mind.
First, I wanted to “minmax” mathematics in the sense that the knowledge in the book should be maximally useful, but minimal in quantity.
Second, explanations should be intuitive and visual, leaning toward user-friendliness rather than mathematical precision. We have machine learning in mind, not mathematics.
Third, every concept and algorithm should be implemented from scratch in Python. This is essential to understanding mathematics. In the book, we learn NumPy along with the math, implementing algorithms such as the singular value decomposition or the omnipresent gradient descent.