Data Analysis with Open Source Tools
A hands-on guide for programmers and data scientists
Publisher: O'Reilly Media
Released: November 2010
Pages: 540
Description
Table of Contents
Product Details
About the Author
Recommended for You
Recently Viewed
Search Patterns
By Peter Morville, Jeffery Callender
January 2010
Ebook: $31.99
Print & Ebook: $43.99
Print: $39.99
Netbooks: The Missing Manual
By J.D. Biersdorfer
August 2009
Ebook: $19.99
Print & Ebook: $27.49
Print: $24.99
Inside Cyber Warfare
By Jeffrey Carr
December 2009
Ebook: $31.99
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
O'Reilly MediaData Analysis with Open Source Tools
 
3.5

(based on 13 reviews)

Ratings Distribution

  • 5 Stars

     

    (3)

  • 4 Stars

     

    (5)

  • 3 Stars

     

    (2)

  • 2 Stars

     

    (2)

  • 1 Stars

     

    (1)

77%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (8)
  • Accurate (6)
  • Well-written (6)
  • Concise (4)
  • Easy to understand (4)

Cons

  • Too many errors (3)

Best Uses

  • Intermediate (7)
  • Novice (6)
  • Expert (4)
  • Student (4)
    • Reviewer Profile:
    • Developer (10)

Reviewed by 13 customers

Sort by

Displaying reviews 1-10

Back to top

Previous | Next »

(1 of 1 customers found this review helpful)

 
5.0

just 'excellent'

By Old Codger

from San Diego, CA

About Me Designer, Developer, Maker

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Expert
    • Intermediate
    • Novice
    • Student

    Comments about O'Reilly Media Data Analysis with Open Source Tools:

    This book is very much in line with what an O'Reilly text is all about. Clear, concise explanatory narrative with examples to illustrate and aid complete understanding. Not only is the author good, but the editors seem to have disciplined the flow of the book well also.
    I have nothing but praise for the way the author presents difficult things in easy terms. The author has a very clear understanding of the subjects discussed and so his explanations are clear and have made some things that were a very diffuclt for me to comprehend for many years immediately obvious. Thank you, thank you all.

    (3 of 3 customers found this review helpful)

     
    5.0

    Full of wisdom, not just techniques

    By Brice

    from Paris, France

    About Me Developer, Sys Admin

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Enlightening
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate
      • Novice
      • Student

      Comments about O'Reilly Media Data Analysis with Open Source Tools:

      This book is enlightening. Not only it discusses the techniques with sufficient details, but it also gives a clear idea of what is behind them and how to use them. It really enables one to extract meaning out of the data.

      As an example, chapter 9 about statistics is not your usual enumeration of p-values, t-tests and the like. Instead, it says everything that is NOT in your usual textbook. This chapter starts with this: "[here], I want to explain what classical statistics does, why it is the way it is, and what it is good for". Then the author goes on explaining the design goals of the usual methods. And from those design goals, he concludes about situations in which they are useful, and their limitations. This gives much more insight than what I had read before.

      And the whole book is this way: full of rare but very useful information, and wisdom. It's also relatively easy to read (some of the most difficult parts are optional and marked as such) and contains many references.

      However, it is not so much about software tools, though there are examples of how to use some of them.

      (3 of 3 customers found this review helpful)

       
      4.0

      An complete voyage into Data Analysis

      By Zalakain

      from Iruña, Nafarroa

      About Me Designer, Developer

      Pros

      • Accurate
      • Complete
      • Helpful examples

      Cons

      • Difficult to understand
      • Not easy on ebook format

      Best Uses

      • Expert
      • Reference

      Comments about O'Reilly Media Data Analysis with Open Source Tools:

      Very good and comprehensive exercise by Mr Janert on how to produce "readable" graphs (read information) on top of massive data volumes, all with open source tools. This is not a book showing samples or "how-to" code that you can run easily on your app (HTML- or OS-based). Instead it goes much deeper than that, explaining the math that supports the data analysis, lots of the statistical theory underlying the data analysis processes, etc. But this is not a book to read during commutes or trips on ebook format, but to be read whilst at home or in a quiet place instead, giving it the care and attention if deserves.

      There are also many good things about it too: the Workshops provided are very good step-by-step descriptions of the process taken by Mr Janert to solve them. Given that the subject of the book is dense, this seems like the best idea to help understanding what has been talked about.

      Many kinds of graphs (like jitter plots, scatter, mosaic plots, kohonen maps, etc.) and the logic underlying them (logarithms, pareto, regression, estimations, Monte Carlo simulation, etc.) are covered in this book. So I find it a great source of information that can be perfectly used as a superb reference book when developing a projects requiring graphical analysis tools on big volumes of data.

      (1 of 1 customers found this review helpful)

       
      5.0

      Complex Subject Matter Clearly Explained

      By doug

      from San Francisco

      About Me Developer

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate
        • Novice
        • Student

        Comments about O'Reilly Media Data Analysis with Open Source Tools:

        I used this book as a reference--but it's not a 'Cookbook'. In fact, there are plenty of recipes for any of the techniques in this Book, available on the Net. What i don't often find there are well-written explanations of these techniques in sufficient detail to allow you to sit down and code your own implementation from start to finish. This is the real value of this DAWOST. The explanations of Kohonen Maps and Discrete-Event Simulation are particularly excellent.

        (3 of 4 customers found this review helpful)

         
        4.0

        Data Analysis w/ Professional Experience

        By Eder Andres Avila

        from Paipa, Colombia

        About Me University Student

        Verified Reviewer

        Pros

        • Accurate
        • Well-written

        Cons

          Best Uses

          • Intermediate
          • Novice
          • Student

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          This is a book about how to design a strategy to understand the organization's data collected using statistical, graphical, analytical and reporting methods and open source tools. This book explains the major concerns about how to extract the information that the data tries to show about products, finances, processes and others. For that purpose, every information engineer should consider:

          • The underlying properties of data
          • The ways to represent the current status of the data
          • The criteria to select relevant data and attributes
          • The algorithms to analyze the selected data and attributes
          • The ways to report the conclusions of the performed data analysis.

          The author Philipp K. Janert takes a designer approach rather than an implementer approach. That means that you will gain important suggestions and tips to propose a plan for data analysis, instead of how to build an entire or partial information infrastructure using open source tools like Python, R, PostgreSQL and Weka.

          Then, for some developers the lack of full programming constructs may be disappointing. However, I feel that Philipp K. Janert's main goal is to share with us his own professional experiences in real world enterprise analytical projects from a requirements perspective. In fact, many reference and recipe books cover deeply the aforementioned open source technologies so you can start to build a data analysis subsystem from zero, but without this book, you can lack the enterprise's point of view, something much more related to data architecture and data policies.

          Despite the implementer approach is not fully covered, you'll be able to understand how the analytical demands can be satisfied using specifically the programming languages Python and R given its speed of execution, numerical analysis capabilities and cross-platform support. Each chapter contains both the Philipp K. Janert's professional experience and the core programming snippets that make such concepts a programming asset.

          In conclusion, if it is true that this book will not guide you to develop a data analysis tool with all the specific programming details of Python and R, it is also true that you will gain worthy professional experiences to design strategies, architectures and policies for data analysis.

          This review is in exchange of the O'Reilly Bloggers Review Program (oreilly.com/bloggers).

          (5 of 5 customers found this review helpful)

           
          3.0

          Data Analysis with Open Source Tools

          By Levon

          from Long Island, NY

          About Me Developer

          Verified Reviewer

          Pros

          • Easy to understand
          • Helpful examples

          Cons

          • Too many errors

          Best Uses

          • Novice

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          Data Analysis with Open Source Tools is an excellent primer for those who need an overview of the field of Data Analysis, along with pointers to some of the most popular free and open source tools.
          The book does have it's short comings. There are places where the text is either confusing to read, or just wrong, and they should have a website with links to all of the tools presented, along with the data used with the examples in order to make it easier for the reader to follow along with the examples and jump-start them playing with the tools presented. The presentation of a set of analytic techniques, followed by a workshop where one of the tools is introduced to work through a real example is a strong point of this book. I also appreciated the informal description of classical statistics to help provide context to the subject. This is something that should accompany any introductory text on classical statistics. It both gives an opportunity to see the techniques in action and provides some pointers for those wanting more hands-on experience with a set of techniques. Overall, I would recommend this as the best primer to Data Analysis techniques that I have come across, but it still fall short of being an excellent book.

          (4 of 4 customers found this review helpful)

           
          3.0

          Decent overview of data analysis

          By Robert, a software engineer

          from Royersford, PA

          About Me Developer

          Verified Reviewer

          Pros

          • Helpful examples
          • Thorough

          Cons

          • Difficult to understand
          • Not enough examples

          Best Uses

          • Intermediate

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          If you are expecting a book filled with examples of NoSQL databases like Hadoop and Cassandra, you are definitely going to be disappointed. The key with this book is to look at the cover. Data analysis is the main point of this book, and open source tools are really just a nice sidebar. The data analysis information is fairly solid and ranges from some basic methods, through statistics and eventually getting to some machine learning methods like clustering and categorization. The open source tools portions of the book are based on examples of the various analysis methods, but do not delve too deeply into how the tools work.

          Some people may think that some important statistical methods were missing, but the author follows each chapter with recommended resources. These recommendations end up being a huge collection of excellent books that you could review for deeper treatment of the various topics.

          The appendices are fantastic. The first talks more about programming tools, the second gives a nice overview of some of the calculus used in the book, and the third talks about where to get the data you want to work with and how to work with it, like cleaning the data and normalizing it. In some cases, you may even want to start with the appendices before getting into the meat of the book.

          (3 of 3 customers found this review helpful)

           
          4.0

          assorted techniques for large data sets

          By Morris

          from New York City

          Verified Reviewer

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          This book has a bunch of examples of using various tools to try to learn something useful from large sets of data.

          I often find myself in a situation where I have some millions of records of something, and I have to figure out what is going on. Much of the book is examples drawn from project that the author worked on, but I find his ideas useful in my work.

          The book is not a set by set cookbook, teaching you how to use various tools. It is much more about giving you ideas of what you might want to do, and then at the end of the chapter giving a short discussion on what tools the author has experience with for that type of work.

          I would use this book to answer the question: "What type of analysis might be useful?" but not the question "How exactly do I perform the analysis using [GNUPlot or R or whatever]?"

          (4 of 4 customers found this review helpful)

           
          4.0

          Great for hands-on readers

          By mhanna

          from Montreal, Canada

          About Me Developer

          Verified Reviewer

          Pros

          • Easy to understand
          • Helpful examples

          Cons

          • Lacks examples files

          Best Uses

          • Intermediate

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          Whether analyzing data is part of your everyday job or one of your projects needs specific data analysis, the same questions arise: What tools are the most suited for the task? Which techniques are the most adapted? What are the numbers really saying? Are they meaningful?

          If those questions are familiar to you, Data Analysis with Open Source Tools by Philipp K Janert might be of great interest. This book is perfect for hands-on readers, wanting to achieve specific goals without getting entangled in formal definitions by getting right to the point. Here is an analogy: a Common Kite is depicted on the front cover of the book and as the bird waiting for its prey, you'll be able to analyse the situation, understand it and take a clear decision after this read.

          The book starts by giving graphical representations of several types of data sets. This is particularly useful to truly understand what we are working with, notably for visual readers. It gives important landmarks for further manipulations with the data. Indeed, within each sections (18 of them), there is a subsection called Workshop where the reader gets his hands dirty on examples. The main tools used are Python and its libraries (NumPy, matplotlib, scipy, etc), R and gnuplot. On the negative side, although several practical examples are given in each section, no files are provided so readers can follow exactly the same example on their side.

          More advanced techniques are given later in the book going from clustering to data mining, but the author always find a way to prune away non essential concepts and focusing on what the reader really needs to know.

          In short, I recommend this book to anyone wanting to give a little more kick to the tremendous amount of data surrounding them. Personal advice and little tricks make the read a most enjoyable one.

          (2 of 4 customers found this review helpful)

           
          2.0

          Nice book, but code examples are missing

          By gnuplot user

          from netherlands

          Verified Reviewer

          Comments about O'Reilly Media Data Analysis with Open Source Tools:

          I kind of like the book, especially its explanations of how and why to model, and how to use the results.
          I also like its use of open source tools. However, a major drawback is that the code used for the graphs/examples is missing. If the author advocates open source tools, give the code so that readers can use this as examples for their own projects.

          Displaying reviews 1-10

          Back to top

          Previous | Next »

           
          Buy 2 Get 1 Free Free Shipping Guarantee
          Buying Options
          Save a Tree - Go Digital  what is this?
          Ebook: $31.99
          Formats: APK, DAISY, ePub, Mobi, PDF
          Print & Ebook: $43.99
          Print: $39.99