Hadoop: The Definitive Guide
MapReduce for the Cloud
Publisher: O'Reilly Media
Final Release Date: May 2009
Pages: 528

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters.

Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce
  • Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Take advantage of HBase, Hadoop's database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.

"Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews


by PowerReviews
oreillyHadoop: The Definitive Guide

(based on 3 reviews)

Ratings Distribution

  • 5 Stars



  • 4 Stars



  • 3 Stars



  • 2 Stars



  • 1 Stars



Reviewed by 3 customers

Displaying reviews 1-3

Back to top


In depth coverage in simple words

By Amit

from Noida

About Me Developer

Verified Reviewer


  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written


    Best Uses

      Comments about oreilly Hadoop: The Definitive Guide:

      I started learning Hadoop on my own reading different blogs & articles, however I was not able to get the flow and then I read through the reviews of this book few months back and purchased it.
      This is amazing book if you are really serious in learning Hadoop and applying in your project. I have almost completed this book and had wonderful learning.

      (16 of 16 customers found this review helpful)


      The elephant has been tamed

      By Paolo at JUG Lugano

      from Lugano, Switzerland

      About Me Designer, Developer

      Verified Reviewer


      • Accurate
      • Helpful examples
      • Well-written


        Best Uses

        • Expert
        • Intermediate
        • Student

        Comments about oreilly Hadoop: The Definitive Guide:

        Managing and analyzing huge data sets has become a very common problem in various areas of modern information technology, from different types of Web applications (social, financial, trading, ...) to applications for analyzing scientific data.

        Distributed systems over a cluster of machines are almost a mandatory choice in such cases, but designing and implementing an effective solution in those areas may be troublesome and become a nightmare.

        The Apache Hadoop Project is an infrastructure that helps the construction of reliable, scalable, distributed systems. Mainly known for its MapReduce and distributed file system (HDFS) subprojects, it actually includes other services that complement or extend them.

        Tom Whites' "Hadoop: The Definitive Guide" is an enjoyable book which fully explains these complex technologies. The book is organized in such a way that the reader is gently guided into the Hadoop ecosystem. It begins with a couple of very readable chapters as a general introduction to the problems Hadoop is meant to solve and the main solutions to them (MapReduce and HDFS), then examines closely all its aspects, often describing what really happens under the scenes, giving useful design suggestions and common pitfalls descriptions. When reading this book you won't be overwhelmed by tons of lines of code: examples are short and yet effective.

        This kind of structure makes it hard to classify the book as a mere tutorial or as a real reference guide, it can be rather considered a mix of the two. If this turns out to be a positive choice in many ways, it has some drawbacks: the reader is sometimes forced to go back and forth through the chapters and has to read it almost entirely to get a full understanding. But this is perhaps the price to pay for having a fluent and pleasant reading.

        Let's go quickly through the chapters:

        The first chapter is a brief history of Hadoop project illustrating its main characteristics and comparing them to those of others similar technologies. Chapter two is a pleasant introduction to MapReduce. The third chapter breaks the continuity of the previous one examining the Hadoop Distributed File System (HDFS subproject) in detail. Chapter four makes a step down in the abstraction layer talking about the Hadoop I/O fundamentals: data integrity, compression, serialization and data structures, explaining the design choice.

        Chapters five to eight are an excellent source for learning Hadoop MapReduce in depth. They cover all the aspects of it: starting from practical ones, such as how to configure, run, test and debug map reduce programs, to those more advanced and formal, like programming models, data formats, sorting and joining tools.

        The two following chapters list few very interesting and useful suggestions for managing and setting up a Hadoop cluster, a precious resource for administrators.

        Chapters eleven to thirteen are for Pig, HBase and Zookeper subprojects under the Hadoop umbrella. Despite of suffering from brevity, they are still interesting.

        Chapter fourteen is made for the reader not to feel alone: important case studies using Hadoop (e.g. Yahoo, and others contributions from Apache Hadoop community).

        My final opinion is that "Hadoop: The Definitive Guide" is a very useful resource for those who want to learn how to ride the "pachydermic" Hadoop (like a "Mahout", perhaps?).

        (7 of 19 customers found this review helpful)


        Greate Book!

        By Eko Kurniawan Khannedy

        from Bandung, Indonesia

        Comments about oreilly Hadoop: The Definitive Guide:

        just want say "Greate Book!"

        Displaying reviews 1-3

        Back to top

        Buy 2 Get 1 Free Free Shipping Guarantee
        Buying Options
        Immediate Access - Go Digital what's this?
        Ebook:  $35.99
        Formats:  APK, DAISY, ePub, Mobi, PDF