Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis. In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks.
Machine-learning algorithms often have tests baked in, but they can’t account for human errors in coding. Rather than blindly rely on machine-learning results as many researchers have, you can mitigate the risk of errors with TDD and write clean, stable machine-learning code. If you’re familiar with Ruby 2.1, you’re ready to start.
Apply TDD to write and run tests before you start coding
Learn the best uses and tradeoffs of eight machine learning algorithms
Use real-world examples to test each algorithm through engaging, hands-on exercises
Understand the similarities between TDD and the scientific method for validating solutions
Be aware of the risks of machine learning, such as underfitting and overfitting data
Explore techniques for improving your machine-learning models or data extraction
Chapter 1Test-Driven Machine Learning
History of Test-Driven Development
TDD and the Scientific Method
Risks with Machine Learning
What to Test for to Reduce Risks
Chapter 2A Quick Introduction to Machine Learning
What Is Machine Learning?
What Can Machine Learning Accomplish?
Mathematical Notation Used Throughout the Book
Chapter 3K-Nearest Neighbors Classification
History of K-Nearest Neighbors Classification
House Happiness Based on a Neighborhood
How Do You Pick K?
What Makes a Neighbor “Near”?
Beard and Glasses Detection Using KNN and OpenCV
Chapter 4Naive Bayesian Classification
Using Bayes’s Theorem to Find Fraudulent Orders
Naive Bayesian Classifier
Chapter 5Hidden Markov Models
Tracking User Behavior Using State Machines
Evaluation: Forward-Backward Algorithm
The Decoding Problem through the Viterbi Algorithm
The Learning Problem
Part-of-Speech Tagging with the Brown Corpus
Chapter 6Support Vector Machines
Solving the Loyalty Mapping Problem
Derivation of SVM
Using SVM to Determine Sentiment
Chapter 7Neural Networks
History of Neural Networks
What Is an Artificial Neural Network?
Building Neural Networks
Using a Neural Network to Classify a Language
Expectation Maximization (EM) Clustering
The Impossibility Theorem
Chapter 9Kernel Ridge Regression
Linear Regression Applied to Collaborative Filtering
Matthew Kirk holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the University of Washington. He started Modulus 7, a data science and Ruby development consulting firm, in early 2012. Matthew has spoken around the world about using machine learning and data science with Ruby.
The animal on the cover of Thoughtful Machine Learning is a Eurasian eagle-owl (Bubo bubo), which is found, as its name suggests, primarily in Eurasia. With a wingspan of 74 inches and a total length of 30 inches for females (males are slightly smaller), the eagle-owl is the largest species of owl. The eagle-owl has distinctive ear tufts and orange eyes. It has a buff underbelly that is streaked with darker color.
Mostly found in mountainous regions or coniferous forests, the eagle-owl is a nocturnal predator that preys on small mammals, reptiles, amphibians, fish, large insects and earthworms. Eagle-owls prefer a concealed location for breeding, such as gullies or among rocks. They lay up to six eggs in the nest at intervals that hatch at different times. After the eggs are laid, the female incubates the eggs and broods the young while the male provides for her her and for the nestlings. After all of the eggs have hatched, parental care is continued for another five months.
The Eurasian eagle-owl has a number of vocalizations, including its song, which can be heard at great distances. It is a deep ooh-hu; the male emphasizes the first syllable, whereas females have a more high-pitched uh-hu song. In close quarters, eagle-owls express annoyance with bill-clicking and cat-like spitting, sometimes taking on a defensive posture: lowered head, ruffled back feathers, fanned tail, and spread wings.
Healthy adults have no natural predators, which makes them an apex predator, though they can be mobbed by smaller birds such as hawks or other owls. The leading cause of death, however, are man-made: electrocution, traffic accidents, and shooting. The eagle-owl can live up to 20 years in the wild; in captivity, without having to face difficult natural conditions, they can live much longer, with reports of up to 60 years in zoo settings. The Eurasian eagle-owl has a habitat that ranges 12 million square miles across Europe and Asia, and its population is estimated between 250,000 and 2.5 million individuals, landing it in the IUCN's "least concern" category. They can usually be found in large numbers in areas hardly populated by humans; however, eagle-owls have been observed living on farmland or in park-like settings in European cities.
The cover image is from the Braukhaus Lexicon. The cover fonts are URW Typewriter and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag's Ubuntu Mono.
Comments about oreilly Thoughtful Machine Learning:
I am an academic who studies machine learning and a Python programmer. For those two reasons, I thought that Thoughtful Machine Learning wouldn't be right for me. However, I had another friend who was interested into getting into machine learning - so we decided we'd read through the book together, one chapter at a time.
That was a great idea.
This book is a short read, and a chapter a day is very easily digestible. With that chapter you get to explore ML topics from the standard Naive Bayes to things that aren't typically included in an entry level ML book like SVMs and HMMs. Matt does a great job of creating clear examples and uses non-academic language to explain things. His code doesn't rely on libraries and demonstrates the effectiveness of these techniques.
What that means is that if you're new to ML - this is a good way to get started into it. If you're old hat - it's a fresh perspective and a way to learn how to discuss these topics in fresh manner.
I am also a practitioner, and I thought it was a very novel thesis to include TDD with ML.
Bottom Line Yes, I would recommend this to a friend