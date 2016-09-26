|
Publisher: O'Reilly Media
Final Release Date: September 2016
Pages: 394
Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.
You’ll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Müller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book.
With this book, you’ll learn:
- Fundamental concepts and applications of machine learning
- Advantages and shortcomings of widely used machine learning algorithms
- How to represent data processed by machine learning, including which data aspects to focus on
- Advanced methods for model evaluation and parameter tuning
- The concept of pipelines for chaining models and encapsulating your workflow
- Methods for working with text data, including text-specific processing techniques
- Suggestions for improving your machine learning and data science skills
Chapter 1Introduction
Why Machine Learning?
Why Python?
scikit-learn
Essential Libraries and Tools
Python 2 Versus Python 3
Versions Used in this Book
A First Application: Classifying Iris Species
Summary and Outlook
Chapter 2Supervised Learning
Classification and Regression
Generalization, Overfitting, and Underfitting
Supervised Machine Learning Algorithms
Uncertainty Estimates from Classifiers
Summary and Outlook
Chapter 3Unsupervised Learning and Preprocessing
Types of Unsupervised Learning
Challenges in Unsupervised Learning
Preprocessing and Scaling
Dimensionality Reduction, Feature Extraction, and Manifold Learning
Clustering
Summary and Outlook
Chapter 4Representing Data and Engineering Features
Categorical Variables
Binning, Discretization, Linear Models, and Trees
Interactions and Polynomials
Univariate Nonlinear Transformations
Automatic Feature Selection
Utilizing Expert Knowledge
Summary and Outlook
Chapter 5Model Evaluation and Improvement
Cross-Validation
Grid Search
Evaluation Metrics and Scoring
Summary and Outlook
Chapter 6Algorithm Chains and Pipelines
Parameter Selection with Preprocessing
Building Pipelines
Using Pipelines in Grid Searches
The General Pipeline Interface
Grid-Searching Preprocessing Steps and Model Parameters
Grid-Searching Which Model To Use
Summary and Outlook
Chapter 7Working with Text Data
Types of Data Represented as Strings
Example Application: Sentiment Analysis of Movie Reviews
Representing Text Data as a Bag of Words
Stopwords
Rescaling the Data with tf–idf
Investigating Model Coefficients
Bag-of-Words with More Than One Word (n-Grams)
Advanced Tokenization, Stemming, and Lemmatization
Topic Modeling and Document Clustering
Summary and Outlook
Chapter 8Wrapping Up
Approaching a Machine Learning Problem
From Prototype to Production
Testing Production Systems
Building Your Own Estimator
Where to Go from Here
Conclusion
Andreas C. Müller
Andreas Müller received his PhD in machine learning from the University of Bonn. After working as a machine learning researcher on computer vision applications at Amazon for a year, he recently joined the Center for Data Science at the New York University. In the last four years, he has been maintainer and one of the core contributor of scikit-learn, a machine learning toolkit widely used in industry and academia, and author and contributor to several other widely used machine learning packages. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.
Sarah Guido
Sarah is a data scientist who has spent a lot of time working in start-ups. She loves Python, machine learning, large quantities of data, and the tech world. She is an accomplished conference speaker, currently resides in New York City, and attended the University of Michigan for grad school.
Colophon
The animal on the cover of Introduction to Machine Learning with Python is a hellbender salamander (Cryptobranchus alleganiensis), an amphibian native to the eastern United States (ranging from New York to Georgia). It has many colorful nicknames, including "Allegheny alligator," "snot otter," and "mud-devil." The origin of the name "hellbender" is unclear: one theory is that early settlers found the salamander's appearance unsettling and supposed it to be a demonic creature trying to return to hell.
The hellbender salamander is a member of the giant salamander family, and can grow as large as 29 inches long. This is the third-largest aquatic salamander species in the world. Their bodies are rather flat, with thick folds of skin along their sides. While they do have a single gill on each side of the neck, hellbenders largely rely on their skin folds to breathe: gas flows in and out through capillaries near the surface of the skin.
Because of this, their ideal habitat is in clear, fast-moving, shallow streams, which provide plenty of oxygen. The hellbender shelters under rocks and hunts primarily by sense of smell, though it is also able to detect vibrations in the water. Its diet is made up of crayfish, small fish, and occasionally the eggs of its own species. The hellbender is also a key member of its ecosystem as prey: predators include various fish, snakes, and turtles.
Hellbender salamander populations have decreased significantly in the last few decades. Water quality is the largest issue, as their respiratory system makes them very sensitive to polluted or murky water. An increase in agriculture and other human activity near their habitat means greater amounts of sediment and chemicals in the water. In an effort to save this endangered species, biologists have begun to raise the amphibians in captivity and release them when they reach a less vulnerable age.
Many of the animals on O'Reilly covers are endangered; all of them are important to the world. To learn more about how you can help, go to animals.oreilly.com .
The cover image is from Wood's Animate Creation.
Table of Contents
