Think Stats
Publisher: O'Reilly Media
Release Date: July 2011
Pages: 138
Read on O'Reilly Online Learning with a 10day trial
Start your free trial now Buy on AmazonWhere’s the cart? Now you can get everything with O'Reilly Online Learning. To purchase books, visit Amazon or your favorite retailer. Questions? See our FAQ or contact customer service:
18008898969 / 7078277019
support@oreilly.com
If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.
You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.
 Develop your understanding of probability and statistics by writing and testing code
 Run experiments to test statistical behavior, such as generating samples from several distributions
 Use simulations to understand concepts that are hard to grasp mathematically
 Learn topics not usually covered in an introductory course, such as Bayesian estimation
 Import data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics tools
 Use statistical inference to answer questions about realworld data
Table of Contents

Chapter 1 Statistical Thinking for Programmers

Do First Babies Arrive Late?

A Statistical Approach

The National Survey of Family Growth

Tables and Records

Significance

Glossary


Chapter 2 Descriptive Statistics

Means and Averages

Variance

Distributions

Representing Histograms

Plotting Histograms

Representing PMFs

Plotting PMFs

Outliers

Other Visualizations

Relative Risk

Conditional Probability

Reporting Results

Glossary


Chapter 3 Cumulative Distribution Functions

The Class Size Paradox

The Limits of PMFs

Percentiles

Cumulative Distribution Functions

Representing CDFs

Back to the Survey Data

Conditional Distributions

Random Numbers

Summary Statistics Revisited

Glossary


Chapter 4 Continuous Distributions

The Exponential Distribution

The Pareto Distribution

The Normal Distribution

Normal Probability Plot

The Lognormal Distribution

Why Model?

Generating Random Numbers

Glossary


Chapter 5 Probability

Rules of Probability

Monty Hall

Poincaré

Another Rule of Probability

Binomial Distribution

Streaks and Hot Spots

Bayes’s Theorem

Glossary


Chapter 6 Operations on Distributions

Skewness

Random Variables

PDFs

Convolution

Why Normal?

Central Limit Theorem

The Distribution Framework

Glossary


Chapter 7 Hypothesis Testing

Testing a Difference in Means

Choosing a Threshold

Defining the Effect

Interpreting the Result

CrossValidation

Reporting Bayesian Probabilities

ChiSquare Test

Efficient Resampling

Power

Glossary


Chapter 8 Estimation

The Estimation Game

Guess the Variance

Understanding Errors

Exponential Distributions

Confidence Intervals

Bayesian Estimation

Implementing Bayesian Estimation

Censored Data

The Locomotive Problem

Glossary


Chapter 9 Correlation

Standard Scores

Covariance

Correlation

Making Scatterplots in Pyplot

Spearman’s Rank Correlation

Least Squares Fit

Goodness of Fit

Correlation and Causation

Glossary


Colophon