If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.
You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.
Develop your understanding of probability and statistics by writing and testing code
Run experiments to test statistical behavior, such as generating samples from several distributions
Use simulations to understand concepts that are hard to grasp mathematically
Learn topics not usually covered in an introductory course, such as Bayesian estimation
Import data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics tools
Use statistical inference to answer questions about real-world data
Allen Downey is an Associate Professor of Computer Science at the Olin College of Engineering. He has taught computer science at Wellesley College, Colby College and U.C. Berkeley. He has a Ph.D. in Computer Science from U.C. Berkeley and Master’s and Bachelor’s degrees from MIT.
This book is online and it is free at the author's website in both pdf and html format. I have to admit I feel a bit stupid giving O'Reilly $9 for an ebook that was free to begin with. Should have looked there first.
9/27/2011
(9 of 9 customers found this review helpful)
4.0
Learn stats by getting your hands dirty
By Louis
from Pittsburgh, PA
About Me Educator
Pros
Stats through programming
Work with data
Cons
Need a stats reference
Not stand alone
Best Uses
Intermediate
Novice
Student
Comments about oreilly Think Stats:
Statistics gets a little respect in Operations research, in part because it gets taught as a bunch of formulas and computer procedures. And the problem with the way that it is taught is that the formulas don't mean anything, and the student may know her way around menus, but that does not mean that she knows under what circumstances to use what method. And everything is learned in isolation, often without practice in getting her hands dirty. Think Stats gives students the chance to get their hands dirty.
Because it uses a programming language (Python) it covers data analysis from beginning to end: viewing data, calculating descriptive statistics, identifying outliers, describing data using the distributions (and explaining what the distributions really mean!). Going through this small book, the goal is understanding and using statistics, not just learning statistics. I have a number of college undergraduate students working on projects. I have started giving them this to work on when they first start with me, both for the programming in Python and to learn statistics and data analysis so they can be useful.
I received a free electronic copy of Think Stats from the O'Reilly Blogger review program.