Doing Data Science
Straight Talk from the Frontline
Publisher: O'Reilly Media
Released: October 2013
Pages: 406

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Topics include:

  • Statistical inference, exploratory data analysis, and the data science process
  • Algorithms
  • Spam filters, Naive Bayes, and data wrangling
  • Logistic regression
  • Financial modeling
  • Recommendation engines and causality
  • Data visualization
  • Social networks and data journalism
  • Data engineering, MapReduce, Pregel, and Hadoop

Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyDoing Data Science
 
4.3

(based on 9 reviews)

Ratings Distribution

  • 5 Stars

     

    (4)

  • 4 Stars

     

    (4)

  • 3 Stars

     

    (1)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

89%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (6)
  • Helpful examples (6)
  • Well-written (6)
  • Accurate (3)

Cons

    Best Uses

    • Student (7)
    • Novice (6)
    • Intermediate (5)
      • Reviewer Profile:
      • Developer (5)

    Reviewed by 9 customers

    Sort by

    Displaying reviews 1-9

    Back to top

     
    4.0

    I've assigned it as textbook

    By RCprofessor

    from San Diego, CA

    About Me Educator

    Verified Reviewer

    Pros

    • Diverse Perspectives
    • Easy to understand
    • Helpful examples

    Cons

    • Errors In Code

    Best Uses

    • Novice
    • Student

    Comments about oreilly Doing Data Science:

    I have assigned it as the "primary" textbook in my big data course at U Calif. I will report on the experience in a few months. For the actual code and statistical analysis, I'm using a book associated with a Stanford online course.
    A word of caution: the very first example, on page 39 (of hardcopy version), has a nonexistent URL to download data from. Fortunately, O'Reilly's Github page has the necessary data, but it took a while to find.
    github.com/oreillymedia/doing_data_science
    Other reviewers say there are more errors. Put out some errata, please.

     
    4.0

    Thanks for writing this book

    By Mary Anne

    from Portland, Oregon

    About Me Data Scientist

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate

      Comments about oreilly Doing Data Science:

      The book describes and perscribes how to do data Science. It isn't a how to manual, the book isn't for beginners. There are plenty of referances to good beginner matterials. The R and Python code provides examples of how to go about doing data science.
      I recieved a review copy of this book. I am very pleased to have read it. The book How to do Data Science succinthly describes topics that I have been trying to get across to people.

      (4 of 4 customers found this review helpful)

       
      3.0

      Good but Kindle version in unreadable

      By Jerry

      from Seattle, WA

      About Me Developer

      Verified Buyer

      Pros

        Cons

          Best Uses

            Comments about oreilly Doing Data Science:

            This is a great book but unfortunately the Kindle version has many issues with the formatting of formulas (sometimes a formula takes half a page, sometimes it is so small as to be unreadable). I will ask for a refund and get the print version instead.

            If O'Reilly wants to be taken seriously as an ebook publisher, you need to improve your quality assurance process. Please have an actual human go through each book and make sure everything is readable on every device you claim you support. Stop wasting your customer's time by publishing unreadable ebooks.

             
            4.0

            great for starters

            By olenaG

            from Melbourne, Australia

            About Me Developer, Maker

            Verified Buyer

            Pros

            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Intermediate
              • Novice
              • Student

              Comments about oreilly Doing Data Science:

              Easy to read. Enough math to give an intuition behind the theory. Great examples. Covers a great range of material and makes you want to explore further yourself.

              (2 of 2 customers found this review helpful)

               
              5.0

              Best guide in the Data Science projects

              By ArthurZ

              from Toronto, ON, Canada

              About Me Database Engineer, Developer

              Verified Reviewer

              Pros

              • Covers a lot of ground
              • Helpful examples

              Cons

              • Difficult to understand

              Best Uses

              • Mature Professional
              • Student

              Comments about oreilly Doing Data Science:

              It is the most difficult to digest and comprehend book to date out of what I have recently read. It is even fun though at the same time. I guess I need to blame myself because this book unexpectedly turned out to be more from the Academia world where my skills in Algebra and Statistics faded out over time than from the practical world. At the same time it was pleasant to feel a student again.

              Nevertheless, the book offers a ton of insight, and how-to's for the in "the trenches" practitioners. This book is full of external reference and facts, it sure took a while for the authors to assemble it.

              From my observations, the knowledge of the R language is necessary before starting reading, sadly, even if a program code is provided in the book there is no sample output.

              The book is written so it has chapters by guest authors, this makes sense as a data project is rarely comprised of one kind of a professional, this nuance is also covered in the book by the way.

              These guest authors are top notch professionals that would write a complete book on their own subject matter of expertise. But because they are the "top guns" in their corresponding field each managed to cover a lot of grounds just within a dedicated single chapter.

              So, in short, the best thing about this book is that in one single investment you get a comprehensive coverage for life on what approach or algorithm to use against a given data science task at hand. You must feel more secure after reading this book and as a result be more eager and ready to embark on any data science project.

              Five out of five stars.

              Disclaimer: I received this book for free as part of O'Reilly Blogger Review program.

               
              4.0

              A broad study with significant depth

              By scalene

              from Franklin, NH

              About Me Developer

              Verified Buyer

              Pros

              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Novice
                • Student

                Comments about oreilly Doing Data Science:

                I am using this book as a way into understanding Data Science from the perspective of database programmer interested in broadening his reach. after a few months I am only 4 chapters in because I have taken the authors' advice and begun to learn a little R programming and refresh my probability knowledge. It has been an obviously expansive study which I am enjoying. It may become quite useful to some extent in my current work. So far, no negatives. There are plenty of practical, useful reference links in the eText.

                (2 of 2 customers found this review helpful)

                 
                5.0

                Excellent, very well written book

                By Biraja Ghoshal

                from London, UK

                About Me Designer

                Verified Reviewer

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Well-written

                Cons

                • E2e Example With Output

                Best Uses

                • Expert
                • Intermediate
                • Novice
                • Student

                Comments about oreilly Doing Data Science:

                This book defines data science as discipline that learn from experience.

                It would be nice if it presented with:

                a. the output / result set / graph etc. and contained more discussion on outcome analysis

                b. more discussion of how to know when to believe the resulting model, how to judge quality with output/example

                c. Time Series Analysis [SARIMA(X) / Winter-Holt]/ Forecasting & Monte Carlo Simulation techniques

                d. Multilevel Modeling of Hierarchical and Longitudinal Data

                e. Data pre & post processing / Regularization / feature selection etc.

                f. HyperCube / SVM based segmentation with complete example

                In summery this book will be everyday reference for me as I seek to master these skills. Every time I reread a chapter I gain a new insight or understand a little better.

                (2 of 5 customers found this review helpful)

                 
                5.0

                Great for Analytics

                By analytics guru

                from new york, new york

                About Me Analytics

                Verified Reviewer

                Pros

                • Accurate
                • Well-written

                Cons

                  Best Uses

                  • Intermediate
                  • Novice
                  • Student

                  Comments about oreilly Doing Data Science:

                  Great for analytics people who have been wondering what data science is about. I think the authors establish for me that there is a new type of work here that needs to be done and that analytics people could benefirt from learning it.

                  (5 of 8 customers found this review helpful)

                   
                  5.0

                  Thoughtful book

                  By interested

                  from austin, tx

                  About Me Developer

                  Verified Reviewer

                  Pros

                  • Easy to understand
                  • Helpful examples
                  • Well-written

                  Cons

                    Best Uses

                    • Intermediate
                    • Novice
                    • Student

                    Comments about oreilly Doing Data Science:

                    Ambitious book. Tries to cover a lot of ground. Very readable and easy to follow. Well-written and explains difficult concepts well. Perhaps does not go in depth or provide enough technical background at times. In some ways like reading a novel and less like a technical book, and quirky. But it fit what I was looking for, which was something to give me an overview of the field.

                    Displaying reviews 1-9

                    Back to top

                     
                    Buy 2 Get 1 Free Free Shipping Guarantee
                    Buying Options
                    Immediate Access - Go Digital what's this?
                    Ebook: $31.99
                    Formats:  ePub, Mobi, PDF
                    Print & Ebook: $43.99
                    Print: $39.99