Data Analysis with Open Source Tools
A hands-on guide for programmers and data scientists
Publisher: O'Reilly Media
Final Release Date: November 2010
Pages: 540

Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.

  • Use graphics to describe data with one, two, or dozens of variables
  • Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments
  • Mine data with computationally intensive methods such as simulation and clustering
  • Make your conclusions understandable through reports, dashboards, and other metrics programs
  • Understand financial calculations, including the time-value of money
  • Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
  • Become familiar with different open source programming environments for data analysis

"Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla

"An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
O'Reilly MediaData Analysis with Open Source Tools
 
3.6

(based on 15 reviews)

Ratings Distribution

  • 5 Stars

     

    (3)

  • 4 Stars

     

    (7)

  • 3 Stars

     

    (2)

  • 2 Stars

     

    (2)

  • 1 Stars

     

    (1)

80%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (10)
  • Accurate (8)
  • Well-written (8)
  • Concise (6)
  • Easy to understand (6)

Cons

  • Too many errors (3)

Best Uses

  • Intermediate (9)
  • Novice (8)
  • Expert (4)
  • Student (4)
    • Reviewer Profile:
    • Developer (12), Designer (4)

Reviewed by 15 customers

Sort by

Displaying reviews 1-10

Back to top

Previous | Next »

 
4.0

Good Survey Style Book for Data Analysis

By Ricky Rick

from Boston, MA

About Me Designer, Developer, Sys Admin

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

  • Not comprehensive enough

Best Uses

  • Intermediate
  • Novice

Comments about O'Reilly Media Data Analysis with Open Source Tools:

I think that this is overall a really good book on data analysis. The author drawing from his experience really adds a lot of value. I also like how he covers a lot of ground on different topics and really tries to clarify the concepts within the text. Overall, I'd definitely recommend this book to someone when it concerns getting a good introduction on all those topics. This isn't a reference book or technical manual.

Weak point would be some of the examples, but then I don't think that is the main focus of the book. He does cover a lot of different technologies. If a reader is interested in a specific case, I'd say grab an additional book that is more detailed on the subject (like an R or Python book, which can add value).

I think this book is most useful to someone with a basic understanding of statistics, programming and familiar or has heard about the concepts in the book. Initially, 2-3 years ago, I read this book and didn't understand it that well. Now after being a developer for some time and having studied statistics on my own, I can really appreciate this text more.

(2 of 2 customers found this review helpful)

 
4.0

Best into to data analysis around

By Ted Dunning

from San Jose, CA

About Me Designer, Developer, Maker

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Real
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Novice

    Comments about O'Reilly Media Data Analysis with Open Source Tools:

    This book is the best I have seen for giving a concise and useful introduction to data analysis for developers. It gives practical examples of solutions to important problems. Now that the author has made code available on-line, the few complaints that others have raised have been resolved.

    I have plenty of books with all the math. And there are plenty of books that cover these issues at a fluffy level. This book strikes a balance that is likely to make it much more useful to many more people. Frankly, I think it is better than either of the books I have written.

    This book is good enough that I have not only recommended it to others, I given it to others. Yeah... it is good enough to put my own money down.

    Go get it.

    (2 of 2 customers found this review helpful)

     
    5.0

    just 'excellent'

    By Old Codger

    from San Diego, CA

    About Me Designer, Developer, Maker

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate
      • Novice
      • Student

      Comments about O'Reilly Media Data Analysis with Open Source Tools:

      This book is very much in line with what an O'Reilly text is all about. Clear, concise explanatory narrative with examples to illustrate and aid complete understanding. Not only is the author good, but the editors seem to have disciplined the flow of the book well also.
      I have nothing but praise for the way the author presents difficult things in easy terms. The author has a very clear understanding of the subjects discussed and so his explanations are clear and have made some things that were a very diffuclt for me to comprehend for many years immediately obvious. Thank you, thank you all.

      (5 of 5 customers found this review helpful)

       
      5.0

      Full of wisdom, not just techniques

      By Brice

      from Paris, France

      About Me Developer, Sys Admin

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Enlightening
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate
        • Novice
        • Student

        Comments about O'Reilly Media Data Analysis with Open Source Tools:

        This book is enlightening. Not only it discusses the techniques with sufficient details, but it also gives a clear idea of what is behind them and how to use them. It really enables one to extract meaning out of the data.

        As an example, chapter 9 about statistics is not your usual enumeration of p-values, t-tests and the like. Instead, it says everything that is NOT in your usual textbook. This chapter starts with this: "[here], I want to explain what classical statistics does, why it is the way it is, and what it is good for". Then the author goes on explaining the design goals of the usual methods. And from those design goals, he concludes about situations in which they are useful, and their limitations. This gives much more insight than what I had read before.

        And the whole book is this way: full of rare but very useful information, and wisdom. It's also relatively easy to read (some of the most difficult parts are optional and marked as such) and contains many references.

        However, it is not so much about software tools, though there are examples of how to use some of them.

        (3 of 3 customers found this review helpful)

         
        4.0

        An complete voyage into Data Analysis

        By Zalakain

        from Iruña, Nafarroa

        About Me Designer, Developer

        Pros

        • Accurate
        • Complete
        • Helpful examples

        Cons

        • Difficult to understand
        • Not easy on ebook format

        Best Uses

        • Expert
        • Reference

        Comments about O'Reilly Media Data Analysis with Open Source Tools:

        Very good and comprehensive exercise by Mr Janert on how to produce "readable" graphs (read information) on top of massive data volumes, all with open source tools. This is not a book showing samples or "how-to" code that you can run easily on your app (HTML- or OS-based). Instead it goes much deeper than that, explaining the math that supports the data analysis, lots of the statistical theory underlying the data analysis processes, etc. But this is not a book to read during commutes or trips on ebook format, but to be read whilst at home or in a quiet place instead, giving it the care and attention if deserves.

        There are also many good things about it too: the Workshops provided are very good step-by-step descriptions of the process taken by Mr Janert to solve them. Given that the subject of the book is dense, this seems like the best idea to help understanding what has been talked about.

        Many kinds of graphs (like jitter plots, scatter, mosaic plots, kohonen maps, etc.) and the logic underlying them (logarithms, pareto, regression, estimations, Monte Carlo simulation, etc.) are covered in this book. So I find it a great source of information that can be perfectly used as a superb reference book when developing a projects requiring graphical analysis tools on big volumes of data.

        (3 of 3 customers found this review helpful)

         
        5.0

        Complex Subject Matter Clearly Explained

        By doug

        from San Francisco

        About Me Developer

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate
          • Novice
          • Student

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          I used this book as a reference--but it's not a 'Cookbook'. In fact, there are plenty of recipes for any of the techniques in this Book, available on the Net. What i don't often find there are well-written explanations of these techniques in sufficient detail to allow you to sit down and code your own implementation from start to finish. This is the real value of this DAWOST. The explanations of Kohonen Maps and Discrete-Event Simulation are particularly excellent.

          (4 of 6 customers found this review helpful)

           
          4.0

          Data Analysis w/ Professional Experience

          By Eder Andres Avila

          from Paipa, Colombia

          About Me University Student

          Verified Reviewer

          Pros

          • Accurate
          • Well-written

          Cons

            Best Uses

            • Intermediate
            • Novice
            • Student

            Comments about O'Reilly Media Data Analysis with Open Source Tools:

            This is a book about how to design a strategy to understand the organization's data collected using statistical, graphical, analytical and reporting methods and open source tools. This book explains the major concerns about how to extract the information that the data tries to show about products, finances, processes and others. For that purpose, every information engineer should consider:

            • The underlying properties of data
            • The ways to represent the current status of the data
            • The criteria to select relevant data and attributes
            • The algorithms to analyze the selected data and attributes
            • The ways to report the conclusions of the performed data analysis.

            The author Philipp K. Janert takes a designer approach rather than an implementer approach. That means that you will gain important suggestions and tips to propose a plan for data analysis, instead of how to build an entire or partial information infrastructure using open source tools like Python, R, PostgreSQL and Weka.

            Then, for some developers the lack of full programming constructs may be disappointing. However, I feel that Philipp K. Janert's main goal is to share with us his own professional experiences in real world enterprise analytical projects from a requirements perspective. In fact, many reference and recipe books cover deeply the aforementioned open source technologies so you can start to build a data analysis subsystem from zero, but without this book, you can lack the enterprise's point of view, something much more related to data architecture and data policies.

            Despite the implementer approach is not fully covered, you'll be able to understand how the analytical demands can be satisfied using specifically the programming languages Python and R given its speed of execution, numerical analysis capabilities and cross-platform support. Each chapter contains both the Philipp K. Janert's professional experience and the core programming snippets that make such concepts a programming asset.

            In conclusion, if it is true that this book will not guide you to develop a data analysis tool with all the specific programming details of Python and R, it is also true that you will gain worthy professional experiences to design strategies, architectures and policies for data analysis.

            This review is in exchange of the O'Reilly Bloggers Review Program (oreilly.com/bloggers).

            (5 of 7 customers found this review helpful)

             
            3.0

            Data Analysis with Open Source Tools

            By Levon

            from Long Island, NY

            About Me Developer

            Verified Reviewer

            Pros

            • Easy to understand
            • Helpful examples

            Cons

            • Too many errors

            Best Uses

            • Novice

            Comments about O'Reilly Media Data Analysis with Open Source Tools:

            Data Analysis with Open Source Tools is an excellent primer for those who need an overview of the field of Data Analysis, along with pointers to some of the most popular free and open source tools.
            The book does have it's short comings. There are places where the text is either confusing to read, or just wrong, and they should have a website with links to all of the tools presented, along with the data used with the examples in order to make it easier for the reader to follow along with the examples and jump-start them playing with the tools presented. The presentation of a set of analytic techniques, followed by a workshop where one of the tools is introduced to work through a real example is a strong point of this book. I also appreciated the informal description of classical statistics to help provide context to the subject. This is something that should accompany any introductory text on classical statistics. It both gives an opportunity to see the techniques in action and provides some pointers for those wanting more hands-on experience with a set of techniques. Overall, I would recommend this as the best primer to Data Analysis techniques that I have come across, but it still fall short of being an excellent book.

            (4 of 5 customers found this review helpful)

             
            3.0

            Decent overview of data analysis

            By Robert, a software engineer

            from Royersford, PA

            About Me Developer

            Verified Reviewer

            Pros

            • Helpful examples
            • Thorough

            Cons

            • Difficult to understand
            • Not enough examples

            Best Uses

            • Intermediate

            Comments about O'Reilly Media Data Analysis with Open Source Tools:

            If you are expecting a book filled with examples of NoSQL databases like Hadoop and Cassandra, you are definitely going to be disappointed. The key with this book is to look at the cover. Data analysis is the main point of this book, and open source tools are really just a nice sidebar. The data analysis information is fairly solid and ranges from some basic methods, through statistics and eventually getting to some machine learning methods like clustering and categorization. The open source tools portions of the book are based on examples of the various analysis methods, but do not delve too deeply into how the tools work.

            Some people may think that some important statistical methods were missing, but the author follows each chapter with recommended resources. These recommendations end up being a huge collection of excellent books that you could review for deeper treatment of the various topics.

            The appendices are fantastic. The first talks more about programming tools, the second gives a nice overview of some of the calculus used in the book, and the third talks about where to get the data you want to work with and how to work with it, like cleaning the data and normalizing it. In some cases, you may even want to start with the appendices before getting into the meat of the book.

            (3 of 3 customers found this review helpful)

             
            4.0

            assorted techniques for large data sets

            By Morris

            from New York City

            Verified Reviewer

            Comments about O'Reilly Media Data Analysis with Open Source Tools:

            This book has a bunch of examples of using various tools to try to learn something useful from large sets of data.

            I often find myself in a situation where I have some millions of records of something, and I have to figure out what is going on. Much of the book is examples drawn from project that the author worked on, but I find his ideas useful in my work.

            The book is not a set by set cookbook, teaching you how to use various tools. It is much more about giving you ideas of what you might want to do, and then at the end of the chapter giving a short discussion on what tools the author has experience with for that type of work.

            I would use this book to answer the question: "What type of analysis might be useful?" but not the question "How exactly do I perform the analysis using [GNUPlot or R or whatever]?"

            Displaying reviews 1-10

            Back to top

            Previous | Next »

             
            Buy 2 Get 1 Free Free Shipping Guarantee
            Buying Options
            Immediate Access - Go Digital what's this?
            Ebook: $33.99
            Formats:  APK, DAISY, ePub, Mobi, PDF
            Print & Ebook: $43.99
            Print: $39.99