Doing Data Science
Straight Talk from the Frontline
Publisher: O'Reilly Media
Final Release Date: October 2013
Pages: 406

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Topics include:

  • Statistical inference, exploratory data analysis, and the data science process
  • Algorithms
  • Spam filters, Naive Bayes, and data wrangling
  • Logistic regression
  • Financial modeling
  • Recommendation engines and causality
  • Data visualization
  • Social networks and data journalism
  • Data engineering, MapReduce, Pregel, and Hadoop

Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyDoing Data Science
 
4.3

(based on 11 reviews)

Ratings Distribution

  • 5 Stars

     

    (5)

  • 4 Stars

     

    (4)

  • 3 Stars

     

    (2)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

82%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (7)
  • Helpful examples (7)
  • Well-written (7)
  • Accurate (4)
  • Concise (3)

Cons

    Best Uses

    • Novice (8)
    • Student (7)
    • Intermediate (6)
      • Reviewer Profile:
      • Developer (6)

    Reviewed by 11 customers

    Sort by

    Displaying reviews 1-10

    Back to top

    Previous | Next »

    (2 of 2 customers found this review helpful)

     
    3.0

    Failed as a textbook for my course.

    By RCprofessor

    from San Diego, CA

    About Me B-school Professor, Educator

    Pros

    • Genuine Business Apps

    Cons

    • Lightweight Discussions
    • Too basic
    • Uneven

    Best Uses

    • Intro To Business Uses
    • Novice

    Comments about oreilly Doing Data Science:

    On 4/5/2014 I wrote that "I have assigned it as the "primary" textbook in my big data course at U Calif. I will report on the experience in a few months."
    Here's my report: it did not work well. By about week 5 of the course, I stopped assigning it, except as supplemental reading. Here is an explanation of what I found. I'm being harsh because I had hoped the book would work for a textbook. As a general "background reading" book it would have more value.

    The book is based on a series of independent lectures given by various practitioners to the lead author's class. The two authors have done a good job of converting the lectures to text. But most of the lectures were overviews of projects, and did not provide enough information for my students to actually do something similar. Another problem is that inevitably there were both duplications and gaps in coverage. A topic would re-appear many chapters later, with some overlap but no explicit comparison of approaches. Organization of the book seemed to be based on the chronology of the course, and nothing more. For example some chapters had 2 very different topics by two speakers.

    My students are in a professional social science degree MS program, which is like the intended audience (MBA students), but it was still not rigorous enough for them. In the end I was also irritated by the overall "Physics for Poets" tone. In other words, the intended audience was apparently not expected to actually do their own analysis. Instead the course was designed as a survey for business students who "would hire someone to do the actual work for them." For example, many chapters discussed the question "What is Data Science, anyway?" More useful would have been "What kinds of questions can we answer with this approach, and what kinds can't we?"

    There were some valuable discussions in specific chapters. Good chapters included Chapter 11 on Causality (with a nice debunking of some analysis by OK Cupid), and strangely enough Chapter 13 on Data Competitions. On the other hand, a discussion titled "Thought Experiment: What Are the Ethical Implications of a Robo-Grader?" was pointless.

    Writing actual R code: Segments of R code were provided for many lectures, and most were useful. But they were generally not complete, so that students could not use them directly. The Stanford course text that I used was much more useful to teach how to write working code to do actual analysis.

    My plan for next year is to use a book that is more consistent and cumulative, and probably to use an actual textbook. My leading candidate has a variety of tested assignments at the end of each chapter ("problem sets"), with available answers for some of them, which saves me a lot of time and gives the students more feedback.
    This book WOULD be useful for "Physics for Poets" classes. It could also be good for people who have learned some data mining already, and want to read about various business projects at a superficial (but interesting) level. Someone who has been doing analytics for science (eg astronomy) and wants to get a job in a company might find it useful to get a sense of the for-profit world. For example, there were speakers from well-known companies eg Facebook.

    (1 of 1 customers found this review helpful)

     
    5.0

    Excellent Summary!!

    By Prometheus

    from Richardson, Tx.

    About Me Designer, Developer, Sys Admin

    Verified Buyer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

    • Digestible
    • Succint Clarity

    Best Uses

    • Expert
    • Intermediate
    • Novice

    Comments about oreilly Doing Data Science:

    This book explained confusing terminology that is loosely used throughout my IT community. Its definitions are based on common-sense, real-world topics and are suitably digestible for novices and experts.

    This book is a must read for all Data Science aspirants seeking direction, definitions, and knowledge on this evolving body knowledge.

    (4 of 4 customers found this review helpful)

     
    4.0

    I've assigned it as textbook

    By RCprofessor

    from San Diego, CA

    About Me Educator

    Verified Reviewer

    Pros

    • Diverse Perspectives
    • Easy to understand
    • Helpful examples

    Cons

    • Errors In Code

    Best Uses

    • Novice
    • Student

    Comments about oreilly Doing Data Science:

    I have assigned it as the "primary" textbook in my big data course at U Calif. I will report on the experience in a few months. For the actual code and statistical analysis, I'm using a book associated with a Stanford online course.
    A word of caution: the very first example, on page 39 (of hardcopy version), has a nonexistent URL to download data from. Fortunately, O'Reilly's Github page has the necessary data, but it took a while to find.
    github.com/oreillymedia/doing_data_science
    Other reviewers say there are more errors. Put out some errata, please.

    (1 of 2 customers found this review helpful)

     
    4.0

    Thanks for writing this book

    By Mary Anne

    from Portland, Oregon

    About Me Data Scientist

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate

      Comments about oreilly Doing Data Science:

      The book describes and perscribes how to do data Science. It isn't a how to manual, the book isn't for beginners. There are plenty of referances to good beginner matterials. The R and Python code provides examples of how to go about doing data science.
      I recieved a review copy of this book. I am very pleased to have read it. The book How to do Data Science succinthly describes topics that I have been trying to get across to people.

      (10 of 10 customers found this review helpful)

       
      3.0

      Good but Kindle version in unreadable

      By Jerry

      from Seattle, WA

      About Me Developer

      Verified Buyer

      Pros

        Cons

          Best Uses

            Comments about oreilly Doing Data Science:

            This is a great book but unfortunately the Kindle version has many issues with the formatting of formulas (sometimes a formula takes half a page, sometimes it is so small as to be unreadable). I will ask for a refund and get the print version instead.

            If O'Reilly wants to be taken seriously as an ebook publisher, you need to improve your quality assurance process. Please have an actual human go through each book and make sure everything is readable on every device you claim you support. Stop wasting your customer's time by publishing unreadable ebooks.

             
            4.0

            great for starters

            By olenaG

            from Melbourne, Australia

            About Me Developer, Maker

            Verified Buyer

            Pros

            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Intermediate
              • Novice
              • Student

              Comments about oreilly Doing Data Science:

              Easy to read. Enough math to give an intuition behind the theory. Great examples. Covers a great range of material and makes you want to explore further yourself.

              (2 of 3 customers found this review helpful)

               
              5.0

              Best guide in the Data Science projects

              By ArthurZ

              from Toronto, ON, Canada

              About Me Database Engineer, Developer

              Verified Reviewer

              Pros

              • Covers a lot of ground
              • Helpful examples

              Cons

              • Difficult to understand

              Best Uses

              • Mature Professional
              • Student

              Comments about oreilly Doing Data Science:

              It is the most difficult to digest and comprehend book to date out of what I have recently read. It is even fun though at the same time. I guess I need to blame myself because this book unexpectedly turned out to be more from the Academia world where my skills in Algebra and Statistics faded out over time than from the practical world. At the same time it was pleasant to feel a student again.

              Nevertheless, the book offers a ton of insight, and how-to's for the in "the trenches" practitioners. This book is full of external reference and facts, it sure took a while for the authors to assemble it.

              From my observations, the knowledge of the R language is necessary before starting reading, sadly, even if a program code is provided in the book there is no sample output.

              The book is written so it has chapters by guest authors, this makes sense as a data project is rarely comprised of one kind of a professional, this nuance is also covered in the book by the way.

              These guest authors are top notch professionals that would write a complete book on their own subject matter of expertise. But because they are the "top guns" in their corresponding field each managed to cover a lot of grounds just within a dedicated single chapter.

              So, in short, the best thing about this book is that in one single investment you get a comprehensive coverage for life on what approach or algorithm to use against a given data science task at hand. You must feel more secure after reading this book and as a result be more eager and ready to embark on any data science project.

              Five out of five stars.

              Disclaimer: I received this book for free as part of O'Reilly Blogger Review program.

               
              4.0

              A broad study with significant depth

              By scalene

              from Franklin, NH

              About Me Developer

              Verified Buyer

              Pros

              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Novice
                • Student

                Comments about oreilly Doing Data Science:

                I am using this book as a way into understanding Data Science from the perspective of database programmer interested in broadening his reach. after a few months I am only 4 chapters in because I have taken the authors' advice and begun to learn a little R programming and refresh my probability knowledge. It has been an obviously expansive study which I am enjoying. It may become quite useful to some extent in my current work. So far, no negatives. There are plenty of practical, useful reference links in the eText.

                (2 of 2 customers found this review helpful)

                 
                5.0

                Excellent, very well written book

                By Biraja Ghoshal

                from London, UK

                About Me Designer

                Verified Reviewer

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Well-written

                Cons

                • E2e Example With Output

                Best Uses

                • Expert
                • Intermediate
                • Novice
                • Student

                Comments about oreilly Doing Data Science:

                This book defines data science as discipline that learn from experience.

                It would be nice if it presented with:

                a. the output / result set / graph etc. and contained more discussion on outcome analysis

                b. more discussion of how to know when to believe the resulting model, how to judge quality with output/example

                c. Time Series Analysis [SARIMA(X) / Winter-Holt]/ Forecasting & Monte Carlo Simulation techniques

                d. Multilevel Modeling of Hierarchical and Longitudinal Data

                e. Data pre & post processing / Regularization / feature selection etc.

                f. HyperCube / SVM based segmentation with complete example

                In summery this book will be everyday reference for me as I seek to master these skills. Every time I reread a chapter I gain a new insight or understand a little better.

                (2 of 5 customers found this review helpful)

                 
                5.0

                Great for Analytics

                By analytics guru

                from new york, new york

                About Me Analytics

                Verified Reviewer

                Pros

                • Accurate
                • Well-written

                Cons

                  Best Uses

                  • Intermediate
                  • Novice
                  • Student

                  Comments about oreilly Doing Data Science:

                  Great for analytics people who have been wondering what data science is about. I think the authors establish for me that there is a new type of work here that needs to be done and that analytics people could benefirt from learning it.

                  Displaying reviews 1-10

                  Back to top

                  Previous | Next »

                   
                  Buy 2 Get 1 Free Free Shipping Guarantee
                  Buying Options
                  Immediate Access - Go Digital what's this?
                  Ebook: $38.99
                  Formats:  DAISY, ePub, Mobi, PDF
                  Print & Ebook: $49.49
                  Print: $44.99