Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis. In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks.
Machine-learning algorithms often have tests baked in, but they can’t account for human errors in coding. Rather than blindly rely on machine-learning results as many researchers have, you can mitigate the risk of errors with TDD and write clean, stable machine-learning code. If you’re familiar with Ruby 2.1, you’re ready to start.
Apply TDD to write and run tests before you start coding
Learn the best uses and tradeoffs of eight machine learning algorithms
Use real-world examples to test each algorithm through engaging, hands-on exercises
Understand the similarities between TDD and the scientific method for validating solutions
Be aware of the risks of machine learning, such as underfitting and overfitting data
Explore techniques for improving your machine-learning models or data extraction
Chapter 1Test-Driven Machine Learning
History of Test-Driven Development
TDD and the Scientific Method
Risks with Machine Learning
What to Test for to Reduce Risks
Chapter 2A Quick Introduction to Machine Learning
What Is Machine Learning?
What Can Machine Learning Accomplish?
Mathematical Notation Used Throughout the Book
Chapter 3K-Nearest Neighbors Classification
History of K-Nearest Neighbors Classification
House Happiness Based on a Neighborhood
How Do You Pick K?
What Makes a Neighbor “Near”?
Beard and Glasses Detection Using KNN and OpenCV
Chapter 4Naive Bayesian Classification
Using Bayes’s Theorem to Find Fraudulent Orders
Naive Bayesian Classifier
Chapter 5Hidden Markov Models
Tracking User Behavior Using State Machines
Evaluation: Forward-Backward Algorithm
The Decoding Problem through the Viterbi Algorithm
The Learning Problem
Part-of-Speech Tagging with the Brown Corpus
Chapter 6Support Vector Machines
Solving the Loyalty Mapping Problem
Derivation of SVM
Using SVM to Determine Sentiment
Chapter 7Neural Networks
History of Neural Networks
What Is an Artificial Neural Network?
Building Neural Networks
Using a Neural Network to Classify a Language
Expectation Maximization (EM) Clustering
The Impossibility Theorem
Chapter 9Kernel Ridge Regression
Linear Regression Applied to Collaborative Filtering
Matthew Kirk holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the University of Washington. He started Modulus 7, a data science and Ruby development consulting firm, in early 2012. Matthew has spoken around the world about using machine learning and data science with Ruby.
The animal on the cover of Thoughtful Machine Learning is a Eurasian eagle-owl (Bubo bubo), which is found, as its name suggests, primarily in Eurasia. With a wingspan of 74 inches and a total length of 30 inches for females (males are slightly smaller), the eagle-owl is the largest species of owl. The eagle-owl has distinctive ear tufts and orange eyes. It has a buff underbelly that is streaked with darker color.
Mostly found in mountainous regions or coniferous forests, the eagle-owl is a nocturnal predator that preys on small mammals, reptiles, amphibians, fish, large insects and earthworms. Eagle-owls prefer a concealed location for breeding, such as gullies or among rocks. They lay up to six eggs in the nest at intervals that hatch at different times. After the eggs are laid, the female incubates the eggs and broods the young while the male provides for her her and for the nestlings. After all of the eggs have hatched, parental care is continued for another five months.
The Eurasian eagle-owl has a number of vocalizations, including its song, which can be heard at great distances. It is a deep ooh-hu; the male emphasizes the first syllable, whereas females have a more high-pitched uh-hu song. In close quarters, eagle-owls express annoyance with bill-clicking and cat-like spitting, sometimes taking on a defensive posture: lowered head, ruffled back feathers, fanned tail, and spread wings.
Healthy adults have no natural predators, which makes them an apex predator, though they can be mobbed by smaller birds such as hawks or other owls. The leading cause of death, however, are man-made: electrocution, traffic accidents, and shooting. The eagle-owl can live up to 20 years in the wild; in captivity, without having to face difficult natural conditions, they can live much longer, with reports of up to 60 years in zoo settings. The Eurasian eagle-owl has a habitat that ranges 12 million square miles across Europe and Asia, and its population is estimated between 250,000 and 2.5 million individuals, landing it in the IUCN's "least concern" category. They can usually be found in large numbers in areas hardly populated by humans; however, eagle-owls have been observed living on farmland or in park-like settings in European cities.
The cover image is from the Braukhaus Lexicon. The cover fonts are URW Typewriter and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag's Ubuntu Mono.
Comments about oreilly Thoughtful Machine Learning:
I get it that this book is in Early Release form, but I am rather shocked that O'Reilly would let this get out the door in this form under any any designation. ("Early Outline" edition?) My hope is that this book will be substantially improved, and eventually I'll find that it has important content that I need to learn. The first couple of chapters? Confused presentation, incorrect similies, and dubious references. "Early Release" should only be an option for proven authors, who can be trusted to provide close-to-publishible quality in early drafts. This is far from publishible. I can't believe O'Reilly would have ever let this into the wild 10 years ago.
Maybe in 9 months I'll take a look at the latest revision, in hopes that it will by then be worth attempting later chapters.
Bottom Line No, I would not recommend this to a friend