Head First Statistics
A BrainFriendly Guide
Publisher: O'Reilly Media
Release Date: June 2009
Pages: 718
Read on O'Reilly Online Learning with a 10day trial
Start your free trial now Buy on AmazonWhere’s the cart? Now you can get everything with O'Reilly Online Learning. To purchase books, visit Amazon or your favorite retailer. Questions? See our FAQ or contact customer service:
18008898969 / 7078277019
support@oreilly.com
Whether you're a student, a professional, or just curious about statistical analysis, Head First's brainfriendly formula helps you get a firm grasp of statistics so you can understand key points and actually use them. Learn to present data visually with charts and plots; discover the difference between taking the average with mean, median, and mode, and why it's important; learn how to calculate probability and expectation; and much more.
Head First Statistics is ideal for high school and college students taking statistics and satisfies the requirements for passing the College Board's Advanced Placement (AP) Statistics Exam. With this book, you'll:
 Study the full range of topics covered in firstyear statistics
 Tackle tough statistical concepts using Head First's dynamic, visually rich format proven to stimulate learning and help you retain knowledge
 Explore realworld scenarios, ranging from casino gambling to prescription drug testing, to bring statistical principles to life
 Discover how to measure spread, calculate odds through probability, and understand the normal, binomial, geometric, and Poisson distributions
 Conduct sampling, use correlation and regression, do hypothesis testing, perform chi square analysis, and more
Before you know it, you'll not only have mastered statistics, you'll also see how they work in the real world. Head First Statistics will help you pass your statistics course, and give you a firm understanding of the subject so you can apply the knowledge throughout your life.
Table of Contents

Chapter 1 Visualizing Information: First Impressions

Statistics are everywhere

But why learn statistics?

A tale of two charts

Manic Mango needs some charts

The humble pie chart

Chart failure

Bar charts can allow for more accuracy

Vertical bar charts

Horizontal bar charts

It’s a matter of scale

Using frequency scales

Dealing with multiple sets of data

Your bar charts rock

Categories vs. numbers

Dealing with grouped data

To make a histogram, start by finding bar widths

Manic Mango needs another chart

Make the area of histogram bars proportional to frequency

Step 1: Find the bar widths

Step 2: Find the bar heights

Step 3: Draw your chart—a histogram

Histograms can’t do everything

Introducing cumulative frequency

Drawing the cumulative frequency graph

Choosing the right chart

Manic Mango conquered the games market!


Chapter 2 Measuring Central Tendency: The Middle Way

Welcome to the Health Club

A common measure of average is the mean

Mean math

Dealing with unknowns

Back to the mean

Handling frequencies

Back to the Health Club

Everybody was Kung Fu fighting

Our data has outliers

The butler outliers did it

Watercooler conversation

Finding the median

Business is booming

The Little Ducklings swimming class

Frequency Magnets

Frequency Magnets

What went wrong with the mean and median?

Introducing the mode

Congratulations!


Chapter 3 Measuring Variability and Spread: Power Ranges

Wanted: one player

We need to compare player scores

Use the range to differentiate between data sets

The problem with outliers

We need to get away from outliers

Quartiles come to the rescue

The interquartile range excludes outliers

Quartile anatomy

We’re not just limited to quartiles

So what are percentiles?

Box and whisker plots let you visualize ranges

Variability is more than just spread

Calculating average distances

We can calculate variation with the variance...

...but standard deviation is a more intuitive measure

A quicker calculation for variance

What if we need a baseline for comparison?

Use standard scores to compare values across data sets

Interpreting standard scores

Statsville All Stars win the league!


Chapter 4 Calculating Probabilities: Taking Chances

Fat Dan’s Grand Slam

Roll up for roulette!

Your very own roulette board

Place your bets now!

What are the chances?

Find roulette probabilities

You can visualize probabilities with a Venn diagram

It’s time to play!

And the winning number is...

Let’s bet on an even more likely event

You can also add probabilities

You win!

Time for another bet

Exclusive events and intersecting events

Problems at the intersection

Some more notation

Another unlucky spin...

...but it’s time for another bet

Conditions apply

Find conditional probabilities

You can visualize conditional probabilities with a probability tree

Trees also help you calculate conditional probabilities

Bad luck!

We can find P(Black l Even) using the probabilities we already have

Step 1: Finding P(Black ∩ Even)

So where does this get us?

Step 2: Finding P(Even)

Step 3: Finding P(Black l Even)

These results can be generalized to other problems

Use the Law of Total Probability to find P(B)

Introducing Bayes’ Theorem

We have a winner!

It’s time for one last bet

If events affect each other, they are dependent

If events do not affect each other, they are independent

More on calculating probability for independent events

Winner! Winner!


Chapter 5 Using Discrete Probability Distributions: Manage Your Expectations

Back at Fat Dan’s Casino

We can compose a probability distribution for the slot machine

Expectation gives you a prediction of the results...

... and variance tells you about the spread of the results

Variances and probability distributions

Let’s calculate the slot machine’s variance

Fat Dan changed his prices

There’s a linear relationship between E(X) and E(Y)

Slot machine transformations

General formulas for linear transforms

Every pull of the lever is an independent observation

Observation shortcuts

New slot machine on the block

Add E(X) and E(Y) to get E(X + Y)...

... and subtract E(X) and E(Y) to get E(X – Y)

You can also add and subtract linear transformations

Jackpot!


Chapter 6 Permutations and Combinations: Making Arrangements

The Statsville Derby

It’s a threehorse race

How many ways can they cross the finish line?

Calculate the number of arrangements

Going round in circles

It’s time for the novelty race

Arranging by individuals is different than arranging by type

We need to arrange animals by type

Generalize a formula for arranging duplicates

It’s time for the twentyhorse race

How many ways can we fill the top three positions?

Examining permutations

What if horse order doesn’t matter

Examining combinations

It’s the end of the race


Chapter 7 Geometric, Binomial, and Poisson Distributions: Keeping Things Discrete

Meet Chad, the hapless snowboarder

We need to find Chad’s probability distribution

There’s a pattern to this probability distribution

The probability distribution can be represented algebraically

The pattern of expectations for the geometric distribution

Expectation is 1/p

Finding the variance for our distribution

You’ve mastered the geometric distribution

Should you play, or walk away?

Generalizing the probability for three questions

Let’s generalize the probability further

What’s the expectation and variance?

Binomial expectation and variance

The Statsville Cinema has a problem

Expectation and variance for the Poisson distribution

So what’s the probability distribution?

Combine Poisson variables

The Poisson in disguise

Anyone for popcorn?


Chapter 8 Using the Normal Distribution: Being Normal

Discrete data takes exact values...

... but not all numeric data is discrete

What’s the delay?

We need a probability distribution for continuous data

Probability density functions can be used for continuous data

Probability = area

To calculate probability, start by finding f(x)...

... then find probability by finding the area

We’ve found the probability

Searching for a soul sole mate

Male modelling

The normal distribution is an “ideal” model for continuous data

So how do we find normal probabilities?

Three steps to calculating normal probabilities

Step 1: Determine your distribution

Step 2: Standardize to N(0, 1)

To standardize, first move the mean...

... then squash the width

Now find Z for the specific value you want to find probability for

Step 3: Look up the probability in your handy table

Julie’s probability is in the table

And they all lived happily ever after


Chapter 9 Using the Normal Distribution ii: Beyond Normal

Love is a roller coaster

All aboard the Love Train

Normal bride + normal groom

It’s still just weight

How’s the combined weight distributed?

Finding probabilities

More people want the Love Train

Linear transforms describe underlying changes in values...

...and independent observations describe how many values you have

Expectation and variance for independent observations

Should we play, or walk away?

Normal distribution to the rescue

When to approximate the binomial distribution with the normal

Revisiting the normal approximation

The binomial is discrete, but the normal is continuous

Apply a continuity correction before calculating the approximation

All aboard the Love Train

When to approximate the binomial distribution with the normal

A runaway success!


Chapter 10 Using Statistical Sampling: Taking Samples

The Mighty Gumball taste test

They’re running out of gumballs

Test a gumball sample, not the whole gumball population

How sampling works

When sampling goes wrong

How to design a sample

Define your sampling frame

Sometimes samples can be biased

Sources of bias

How to choose your sample

Simple random sampling

How to choose a simple random sample

There are other types of sampling

We can use stratified sampling...

...or we can use cluster sampling...

...or even systematic sampling

Mighty Gumball has a sample


Chapter 11 Estimating Populations and Samples: Making Predictions

So how long does flavor really last for?

Let’s start by estimating the population mean

Point estimators can approximate population parameters

Let’s estimate the population variance

We need a different point estimator than sample variance

Which formula’s which?

Mighty Gumball has done more sampling

It’s a question of proportion

Buy your gumballs here!

So how does this relate to sampling?

The sampling distribution of proportions

So what’s the expectation of Ps?

And what’s the variance of Ps?

Find the distribution of Ps

Ps follows a normal distribution

How many gumballs?

We need probabilities for the sample mean

The sampling distribution of the mean

Find the expectation for X̄

What about the the variance of X̄?

So how is X̄ distributed?

If n is large, X̄ can still be approximated by the normal distribution

Using the central limit theorem

Sampling saves the day!


Chapter 12 Constructing Confidence Intervals: Guessing with Confidence

Mighty Gumball is in trouble

The problem with precision

Introducing confidence intervals

Four steps for finding confidence intervals

Step 1: Choose your population statistic

Step 2: Find its sampling distribution

Point estimators to the rescue

We’ve found the distribution for X̄

Step 3: Decide on the level of confidence

How to select an appropriate confidence level

Step 4: Find the confidence limits

Start by finding Z

Rewrite the inequality in terms of μ

Finally, find the value of X̄

You’ve found the confidence interval

Let’s summarize the steps

Handy shortcuts for confidence intervals

Just one more problem...

Step 1: Choose your population statistic

Step 2: Find its sampling distribution

X̄ follows the tdistribution when the sample is small

Find the standard score for the tdistribution

Step 3: Decide on the level of confidence

Step 4: Find the confidence limits

Using tdistribution probability tables

The tdistribution vs. the normal distribution

You’ve found the confidence intervals!


Chapter 13 Using Hypothesis Tests: Look At The Evidence

Statsville’s new miracle drug

So what’s the problem?

Resolving the conflict from 50,000 feet

The six steps for hypothesis testing

Step 1: Decide on the hypothesis

So what’s the alternative?

Step 2: Choose your test statistic

Step 3: Determine the critical region

To find the critical region, first decide on the significance level

Step 4: Find the pvalue

We’ve found the pvalue

Step 5: Is the sample result in the critical region?

Step 6: Make your decision

So what did we just do?

What if the sample size is larger?

Let’s conduct another hypothesis test

Step 1: Decide on the hypotheses

Step 2: Choose the test statistic

Use the normal to approximate the binomial in our test statistic

Step 3: Find the critical region

SnoreCull failed the test

Mistakes can happen

Let’s start with Type I errors

What about Type II errors?

Finding errors for SnoreCull

We need to find the range of values

Find P(Type II error)

Introducing power

The doctor’s happy


Chapter 14 The χ2 Distribution: There’s Something Going On...

There may be trouble ahead at Fat Dan’s Casino

Let’s start with the slot machines

The χ2 test assesses difference

So what does the test statistic represent?

Two main uses of the χ2 distribution

v represents degrees of freedom

What’s the significance?

Hypothesis testing with χ2

You’ve solved the slot machine mystery

Fat Dan has another problem

the χ2 distribution can test for independence

You can find the expected frequencies using probability

So what are the frequencies?

We still need to calculate degrees of freedom

Generalizing the degrees of freedom

And the formula is...

You’ve saved the casino


Chapter 15 Correlation and Regression: What’s My Line?

Never trust the weather

Let’s analyze sunshine and attendance

Exploring types of data

Visualizing bivariate data

Scatter diagrams show you patterns

Correlation vs. causation

Predict values with a line of best fit

Your best guess is still a guess

We need to minimize the errors

Introducing the sum of squared errors

Find the equation for the line of best fit

Finding the slope for the line of best fit

Finding the slope for the line of best fit, part ii

We’ve found b, but what about a?

You’ve made the connection

Let’s look at some correlations

The correlation coefficient measures how well the line fits the data

There’s a formula for calculating the correlation coefficient, r

Find r for the concert data

Find r for the concert data, continued

You’ve saved the day!

Leaving town...

It’s been great having you here in Statsville!


Appendix Leftovers: The Top Ten Things (we didn’t cover)

#1. Other ways of presenting data

#2. Distribution anatomy

#3. Experiments

Designing your experiment

#4. Least square regression alternate notation

#5. The coefficient of determination

#6. Nonlinear relationships

#7. The confidence interval for the slope of a regression line

#8. Sampling distributions – the difference between two means

#9. Sampling distributions – the difference between two proportions

#10. E(X) and Var(X) for continuous probability distributions

Finding E(X)

Finding Var(X)


Appendix Statistics Tables: Looking Things Up

#1. Standard normal probabilities

#2. tdistribution critical values

#3. X2 critical values
