Hadoop: The Definitive Guide
MapReduce for the Cloud
Publisher: O'Reilly Media
Released: May 2009
Pages: 528
Description
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Recently Viewed
Using Moodle, 2nd Edition
By Jason Cole, Helen Foster
November 2007
Ebook: $31.99
Print & Ebook: $43.99
Print: $39.99
Google Maps Hacks
By Rich Gibson, Schuyler Erle
January 2006
Ebook: $23.99
Print & Ebook: $32.99
Print: $29.99
XSLT, 2nd Edition
By Doug Tidwell
June 2008
Ebook: $39.99
Print & Ebook: $54.99
Print: $49.99
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
O'Reilly Media Hadoop: The Definitive Guide
 
4.5

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (1)

  • 4 Stars

     

    (1)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Sort by

Displaying reviews 1-2

Back to top

(5 of 5 customers found this review helpful)

 
4.0

The elephant has been tamed

By Paolo at JUG Lugano

from Lugano, Switzerland

About Me Designer, Developer

Verified Reviewer

Pros

  • Accurate
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Expert
    • Intermediate
    • Student

    Comments about O'Reilly Media Hadoop: The Definitive Guide:

    Managing and analyzing huge data sets has become a very common problem in various areas of modern information technology, from different types of Web applications (social, financial, trading, ...) to applications for analyzing scientific data.

    Distributed systems over a cluster of machines are almost a mandatory choice in such cases, but designing and implementing an effective solution in those areas may be troublesome and become a nightmare.

    The Apache Hadoop Project is an infrastructure that helps the construction of reliable, scalable, distributed systems. Mainly known for its MapReduce and distributed file system (HDFS) subprojects, it actually includes other services that complement or extend them.

    Tom Whites' "Hadoop: The Definitive Guide" is an enjoyable book which fully explains these complex technologies. The book is organized in such a way that the reader is gently guided into the Hadoop ecosystem. It begins with a couple of very readable chapters as a general introduction to the problems Hadoop is meant to solve and the main solutions to them (MapReduce and HDFS), then examines closely all its aspects, often describing what really happens under the scenes, giving useful design suggestions and common pitfalls descriptions. When reading this book you won't be overwhelmed by tons of lines of code: examples are short and yet effective.

    This kind of structure makes it hard to classify the book as a mere tutorial or as a real reference guide, it can be rather considered a mix of the two. If this turns out to be a positive choice in many ways, it has some drawbacks: the reader is sometimes forced to go back and forth through the chapters and has to read it almost entirely to get a full understanding. But this is perhaps the price to pay for having a fluent and pleasant reading.

    Let's go quickly through the chapters:

    The first chapter is a brief history of Hadoop project illustrating its main characteristics and comparing them to those of others similar technologies. Chapter two is a pleasant introduction to MapReduce. The third chapter breaks the continuity of the previous one examining the Hadoop Distributed File System (HDFS subproject) in detail. Chapter four makes a step down in the abstraction layer talking about the Hadoop I/O fundamentals: data integrity, compression, serialization and data structures, explaining the design choice.

    Chapters five to eight are an excellent source for learning Hadoop MapReduce in depth. They cover all the aspects of it: starting from practical ones, such as how to configure, run, test and debug map reduce programs, to those more advanced and formal, like programming models, data formats, sorting and joining tools.

    The two following chapters list few very interesting and useful suggestions for managing and setting up a Hadoop cluster, a precious resource for administrators.

    Chapters eleven to thirteen are for Pig, HBase and Zookeper subprojects under the Hadoop umbrella. Despite of suffering from brevity, they are still interesting.

    Chapter fourteen is made for the reader not to feel alone: important case studies using Hadoop (e.g. Yahoo, and others contributions from Apache Hadoop community).

    My final opinion is that "Hadoop: The Definitive Guide" is a very useful resource for those who want to learn how to ride the "pachydermic" Hadoop (like a "Mahout", perhaps?).

    (1 of 9 customers found this review helpful)

     
    5.0

    Greate Book!

    By Eko Kurniawan Khannedy

    from Bandung, Indonesia

    Comments about O'Reilly Media Hadoop: The Definitive Guide:

    just want say "Greate Book!"

    Displaying reviews 1-2

    Back to top

     
    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Save a Tree - Go Digital  what is this?
    Ebook: $35.99
    Formats: APK, ePub, Mobi, PDF