Chapter: Getting Started


Getting What You Need

Installing Enthought Canopy

Python Basics – Part 1

Python Basics – Part 2

Running Python Scripts

Chapter: Statistics and Probability Refresher, and Python Practise

Types of Data

Mean, Median, and Mode

Using Mean, Median, and Mode in Python

Variation and Standard Deviation

Probability Density Function and Probability Mass Function

Common Data Distributions

Percentiles and Moments

A Crash Course in matplotlib

Covariance and Correlation

Conditional Probability

Exercise Solution – Conditional Probability of Purchase by Age

Bayes' Theorem

Chapter: Predictive Models

Linear Regression

Polynomial Regression

Multivariate Regression and Predicting Car Prices

Multi-Level Models

Chapter: Machine Learning with Python

Supervised versus Unsupervised Learning and Train/Test

Using Train/Test to Prevent Overfitting of a Polynomial Regression

Bayesian Methods – Concepts

Implementing a Spam Classifier with Naive Bayes

K-Means Clustering

Clustering People Based on Income and Age

Measuring Entropy

Decision Trees – Concepts

Decision Trees – Predicting Hiring Decisions

Ensemble Learning

Support Vector Machines (SVM) Overview

Using SVM to Cluster People by using scikit-learn

Chapter: Recommender Systems

User-Based Collaborative Filtering

Item-Based Collaborative Filtering

Finding Movie Similarities

Improving the Results of Movie Similarities

Making Movie Recommendations to People

Improve the Recommender's Results

Chapter: More Data Mining and Machine Learning Techniques

K-Nearest Neighbors – Concepts

Using KNN to predict a rating for a movie

Dimensionality Reduction and Principal Component Analysis

A PCA Example with the Iris Dataset

Data Warehousing Overview – ETL and ELT

Reinforcement Learning

Chapter: Dealing with Real-World Data

Bias/Variance Trade-off

K-Fold Cross-Validation to Avoid Overfitting

Data Cleaning and Normalization

Cleaning Web Log Data

Normalizing Numerical Data

Detecting Outliers

Chapter: Apache Spark – Machine Learning on Big Data

Installing Spark – Part 1

Installing Spark – Part 2

Spark Introduction

Spark and the Resilient Distributed Dataset (RDD)

Introducing MLLib

Decision Trees in Spark

K-Means Clustering in Spark

Searching Wikipedia with Spark

Using the Spark 2.0 DataFrame API for MLLib

Chapter: Experimental Design

A/B Testing Concepts

T-Tests and P-Values

Hands On with T-Tests

Determining How Long to Run an Experiment

A/B Test Gotchas

Chapter: You Made It!

More to Explore

