Data Exploration in Python
Best Practices for Developers
Publisher: O'Reilly Media
Final Release Date: November 2015
Run time: 3 hours 30 minutes

If you're a fledgling data scientist with only cursory statistical training and little experience with real world data sets, you may feel like you're stumbling around in the dark when you're asked to interpret and present data to decision makers. How do you validate the data? What analytic model should you use? How do you differentiate between correlation and causation? How do you ensure that your data is solid and your conclusions are on target?

Allen Downey, Professor of Computer Science at Olin College of Engineering, author of Think Stats, Think Python, and Think Complexity, provides safe passage around the common pitfalls of exploratory data analysis, so you can manage, analyze, and present data with confidence.

  • Learn the fundamental tools and methodologies used in data science
  • Discover best practices regarding the ETL (Extract, Transform, and Load) process and data validation
  • Use the open science framework: practice version control, replication, and data pipelining
  • Grasp the effectiveness of CDFs (Common Data Formats) in visualizing distributions
  • Choose the correct analytic model for your data
  • Comprehend statistical inference, effect size, confidence intervals, and hypothesis testing
  • Discern the relationship between variables: understand scatter plots and scatter plot alternatives
  • Understand correlation, linear least squares, linear regression, and logistic regression
  • Master the Zen of testing your data and your conclusions
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews


by PowerReviews
oreillyData Exploration in Python

(based on 1 review)

Ratings Distribution

  • 5 Stars



  • 4 Stars



  • 3 Stars



  • 2 Stars



  • 1 Stars



Reviewed by 1 customer

Displaying review 1

Back to top

(1 of 1 customers found this review helpful)


Statistics has never made more sense

By Mahmoud Hashemi

from San Jose, CA

About Me Developer, Educator

Verified Reviewer


  • Concise
  • Easy to understand
  • Helpful examples


    Best Uses

    • Expert
    • Intermediate
    • Novice
    • Student

    Comments about oreilly Data Exploration in Python:

    To see these two topics, statistics and Python, combined so fluidly and succinctly is an inspiration in itself. Of course, no one is better suited to the task than the author of Think Python and Think Stats, Allen Downey.

    Allen is an expert at choosing simple examples that resonate. The examples never strike one as contrived. The data is often real and always presented cogently. "Foo" and "Bar" may fly in certain circles, but in statistics they only lead to more confusion.

    Anyone who took statistics in school needs a refresher after a few years of real work. Real work and real data have a way of making you wonder if it was really you who passed those statistics exams.

    Seeing how someone experienced masterfully navigate data and code to get at some kernel of truth is something that you can't get in a book. Furthermore, there are a million ways to slice data, and it's too easy to get caught up in the newest tools. The real value for me is seeing which steps Allen takes when, and often more importantly, which steps he skips. A course that simply applies the fundamentals is the booster shot that every practicing engineer needs.

    There are tons of examples and example code. It's readable and makes a fair amount of sense on its own. That might be relevant for some.

    Note that a lot of the statistics are inferential, as opposed to descriptive. But then again, I can't get enough of Allen talking about simply exploring the data. In any case, a great class, well worth the less-than-three hours it takes to watch it at 1.25x, highly recommended to all.

    Displaying review 1

    Back to top

    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Immediate Access - Go Digital what's this?
    Video:  $69.99
    (Streaming, Downloadable)