Gain the confidence you need to apply machine learning in your daily work. With this practical guide, author Matthew Kirk shows you how to integrate and test machine learning algorithms in your code, without the academic subtext.
Featuring graphs and highlighted code examples throughout, the book features tests with Python’s Numpy, Pandas, Scikit-Learn, and SciPy data science libraries. If you’re a software engineer or business analyst interested in data science, this book will help you:
Reference real-world examples to test each algorithm through engaging, hands-on exercises
Apply test-driven development (TDD) to write and run tests before you start coding
Explore techniques for improving your machine-learning models with data extraction and feature development
Watch out for the risks of machine learning, such as underfitting or overfitting data
Work with K-Nearest Neighbors, neural networks, clustering, and other algorithms
Chapter 1Probably Approximately Correct Software
Writing Software Right
Writing the Right Software
The Plan for the Book
Chapter 2A Quick Introduction to Machine Learning
What Is Machine Learning?
What Can Machine Learning Accomplish?
Mathematical Notation Used Throughout the Book
Chapter 3K-Nearest Neighbors
How Do You Determine Whether You Want to Buy a House?
How Valuable Is That House?
What Is a Neighborhood?
Mr. K’s Nearest Neighborhood
Curse of Dimensionality
How Do We Pick K?
Valuing Houses in Seattle
Chapter 4Naive Bayesian Classification
Using Bayes’ Theorem to Find Fraudulent Orders
Inverse Conditional Probability (aka Bayes’ Theorem)
Naive Bayesian Classifier
Naiveté in Bayesian Reasoning
Chapter 5Decision Trees and Random Forests
The Nuances of Mushrooms
Classifying Mushrooms Using a Folk Theorem
Finding an Optimal Switch Point
Chapter 6Hidden Markov Models
Tracking User Behavior Using State Machines
Emissions/Observations of Underlying States
Simplification Through the Markov Assumption
Hidden Markov Model
Evaluation: Forward-Backward Algorithm
The Decoding Problem Through the Viterbi Algorithm
Matthew Kirk has always been “the math guy” to those that know him best. He started his career as a quantitative financial analyst with Parametric Portfolio. While there, he studied momentum and reversal effects in Emerging Markets and optimized their 30 billion dollarportfolio.
He left the finance industry to build the current version of Wetpaint.com, an entertainment website that is visited by over 10 million unique visitors each month. One of hisaccomplishments while there was the initial prototype of their patent pending Social Publishing Platform, which optimizes their publication strategy for Facebook posting.
He left Wetpaint to work with a small startup in Kansas City called SocialVolt as their Chief Scientist. While there, he worked on sentiment analysis tools and spam filtering of social media data.
In 2012 he started Modulus 7, which is a data science and startup consulting firm. His clients have included Ritani, The Clymb, Siren, Sqoop, and many others.
Matthew holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the University of Washington. He is also studying for his M.S. in Computer Science at the Georgia Institute of Technology.
He has spoken around the world about using machine learning and data science with Ruby. When he’s not working, he enjoys listening to his 2000+ vinyl record collection on his Thorens TD160 Mk2 turntable.
The animal on the cover of Thoughtful Machine Learning with Python is the Cuban solenodon (Solenodon cubanus), also know as the almiqui. The Cuban solenodon is a small mammal found only in the Oriente province of Cuba. They are similar in appearance to members of the more common shrew family, with long snouts, small eyes, and a hairless tail.
The diet of the Cuban solenodon is varied, consisting of insects, fungi, and fruits, but also other small animals, which they incapacitate with venomous saliva. Males and females only meet up to mate, and the male takes no part in raising the young. Cuban solenodons are nocturnal and live in subterranean burrows.
The total number of Cuban solenodons is unknown, as they are rarely seen in the wild. At one point they were considered to be extinct, but they are now classified as endangered. Predation from the mongoose (introduced during Spanish colonization) as well as habitat loss from recent construction have negatively impacted the Cuban solenodon population.
Comments about oreilly Thoughtful Machine Learning with Python:
The book seems reasonably clear and well written. The code, on the other hand needs a lot of work. I'm running Python 3.5 in IntelliJ and cannot get the first example to behave. The data is littered values that should be numbers but instead are "#< Geocoder::Result::Bing:0x007fe9a1b0e478>". There is no entry main() method, so I have to guess at how to run the example. And this is for the first chapter!
I look forward to corrected code so that the book is usable
Bottom Line No, I would not recommend this to a friend
Merchant response: "Phil, thanks for your review. As I said in the book, the examples work with Python 2.7 only right now. Though I have been planning a reprint with some updates to get everything working under Python 3.5.
I can add a main method to all of the github code but decided to leave it out of the book because I didn't want it to detract from the code examples.
I'll work on getting this all working on Python 3.5 and IntelliJ so that people such as yourself can use the examples easier. When writing the book I focused on getting it working with vanilla 2.7 python first.