Search Inside and Read Larger Cover Thoughtful Machine Learning with Python A Test-Driven Approach By Publisher: O'Reilly Media Final Release Date: January 2017 Pages: 216 Gain the confidence you need to apply machine learning in your daily work. With this practical guide, author Matthew Kirk shows you how to integrate and test machine learning algorithms in your code, without the academic subtext. Featuring graphs and highlighted code examples throughout, the book features tests with Python’s Numpy, Pandas, Scikit-Learn, and SciPy data science libraries. If you’re a software engineer or business analyst interested in data science, this book will help you: Reference real-world examples to test each algorithm through engaging, hands-on exercises

Apply test-driven development (TDD) to write and run tests before you start coding

Explore techniques for improving your machine-learning models with data extraction and feature development

Watch out for the risks of machine learning, such as underfitting or overfitting data

How Do You Determine Whether You Want to Buy a House? How Valuable Is That House? Hedonic Regression What Is a Neighborhood? K-Nearest Neighbors Mr. K's Nearest Neighborhood Distances Curse of Dimensionality How Do We Pick K? Valuing Houses in Seattle Conclusion Chapter 4 Naive Bayesian Classification Using Bayes' Theorem to Find Fraudulent Orders Conditional Probabilities Probability Symbols Inverse Conditional Probability (aka Bayes' Theorem) Naive Bayesian Classifier Naiveté in Bayesian Reasoning Pseudocount Spam Filter Conclusion Chapter 5 Decision Trees and Random Forests The Nuances of Mushrooms Classifying Mushrooms Using a Folk Theorem Finding an Optimal Switch Point Pruning Trees Conclusion Chapter 6 Hidden Markov Models Tracking User Behavior Using State Machines Emissions/Observations of Underlying States Simplification Through the Markov Assumption Hidden Markov Model Evaluation: Forward-Backward Algorithm The Decoding Problem Through the Viterbi Algorithm The Learning Problem Part-of-Speech Tagging with the Brown Corpus Conclusion Chapter 7 Support Vector Machines Customer Happiness as a Function of What They Say The Theory Behind SVMs Sentiment Analyzer Aggregating Sentiment Mapping Sentiment to Bottom Line Conclusion Chapter 8 Neural Networks What Is a Neural Network? History of Neural Nets Boolean Logic Perceptrons How to Construct Feed-Forward Neural Nets Building Neural Networks Using a Neural Network to Classify a Language Chapter 9 Clustering Studying Data Without Any Bias User Cohorts Testing Cluster Mappings K-Means Clustering EM Clustering The Impossibility Theorem Example: Categorizing Music Conclusion Chapter 10 Improving Models and Data Extraction Debate Club Picking Better Data Feature Transformation and Matrix Factorization Ensemble Learning Conclusion Chapter 11 Putting It Together: Conclusion Machine Learning Algorithms Revisited How to Use This Information to Solve Problems What's Next for You?

Safari Books Online Print: Ebook: Pages: 216 Print ISBN: 978-1-4919-2413-6 | ISBN 10: 1-4919-2413-6 Ebook ISBN: 978-1-4919-2407-5 | ISBN 10: 1-4919-2407-1 Matthew Kirk Matthew Kirk has always been “the math guy” to those that know him best. He started his career as a quantitative financial analyst with Parametric Portfolio. While there, he studied momentum and reversal effects in Emerging Markets and optimized their 30 billion dollarportfolio.



He left the finance industry to build the current version of Wetpaint.com, an entertainment website that is visited by over 10 million unique visitors each month. One of hisaccomplishments while there was the initial prototype of their patent pending Social Publishing Platform, which optimizes their publication strategy for Facebook posting.



He left Wetpaint to work with a small startup in Kansas City called SocialVolt as their Chief Scientist. While there, he worked on sentiment analysis tools and spam filtering of social media data.



In 2012 he started Modulus 7, which is a data science and startup consulting firm. His clients have included Ritani, The Clymb, Siren, Sqoop, and many others.



Matthew holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the University of Washington. He is also studying for his M.S. in Computer Science at the Georgia Institute of Technology.



Matthew holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the University of Washington. He is also studying for his M.S. in Computer Science at the Georgia Institute of Technology.



He has spoken around the world about using machine learning and data science with Ruby. When he's not working, he enjoys listening to his 2000+ vinyl record collection on his Thorens TD160 Mk2 turntable. View Matthew Kirk's full profile page. Colophon The animal on the cover of Thoughtful Machine Learning with Python is the Cuban solenodon (Solenodon cubanus), also know as the almiqui. The Cuban solenodon is a small mammal found only in the Oriente province of Cuba. They are similar in appearance to members of the more common shrew family, with long snouts, small eyes, and a hairless tail. The diet of the Cuban solenodon is varied, consisting of insects, fungi, and fruits, but also other small animals, which they incapacitate with venomous saliva. Males and females only meet up to mate, and the male takes no part in raising the young. Cuban solenodons are nocturnal and live in subterranean burrows. The total number of Cuban solenodons is unknown, as they are rarely seen in the wild. At one point they were considered to be extinct, but they are now classified as endangered. Predation from the mongoose (introduced during Spanish colonization) as well as habitat loss from recent construction have negatively impacted the Cuban solenodon population.