Hadoop: The Definitive Guide, 2nd Edition
Publisher: O'Reilly Media / Yahoo Press
Final Release Date: October 2010
Pages: 628

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.

This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce
  • Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Analyze datasets with Hive, Hadoop’s data warehousing system
  • Take advantage of HBase, Hadoop’s database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

"Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk."

--Doug Cutting, Cloudera

Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyHadoop: The Definitive Guide, 2nd Edition
 
4.7

(based on 3 reviews)

Ratings Distribution

  • 5 Stars

     

    (2)

  • 4 Stars

     

    (1)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Concise (3)

Cons

No Cons

Best Uses

  • Intermediate (3)
  • Novice (3)
    • Reviewer Profile:
    • Developer (3)

Reviewed by 3 customers

Displaying reviews 1-3

Back to top

(1 of 1 customers found this review helpful)

 
4.0

A Definite source for Hadoop

By Sumit

from Lucknow, India

About Me Developer

Pros

  • Accurate
  • Concise
  • Writer a hadoop commiter

Cons

  • Quite same as 1st edition

Best Uses

  • Expert
  • Intermediate
  • Novice

Comments about oreilly Hadoop: The Definitive Guide, 2nd Edition:

This is an excellent guide for professionals looking to increase their knowledge around hadoop. It is quite similar to the first edition, but brings about changes in hadoop ecosystem and is a quite comprehensive text. The author deserves an applause as a complex subject has been explained in an easy manner.

(4 of 4 customers found this review helpful)

 
5.0

Great Knowledge Source

By Don

from San Marcos, CA

About Me Developer, Project manager

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Targeted
  • Well-written

Cons

    Best Uses

    • Curious
    • Expert
    • Intermediate
    • Novice
    • Student

    Comments about oreilly Hadoop: The Definitive Guide, 2nd Edition:

    Back in October 2010 I became interested in Hadoop. I opened a Free Tier Amazon account to work with MapReduce and became more deeply involved. I discovered this book and asked O'Reilly for the eBook version in exchange for this review.

    The total number of pages including Forward, Acknowledgement, Preface, Actual Text, and complete hyperlinked Index is 1391 pages.

    This is one of the best technical books I've read. The author makes a great effort to explain in detail the beginnings, data concepts, elements, shortcomings, and purpose of Hadoop. I was able to understand the supplied code because each code segment is explained thoroughly. As newer Hadoop versions arrive on the scene some of the current limitations will be reduced or eliminated.

    The author takes the time to explain where Hadoop would not likely be a good solution. That is important because new technologies are often accepted by users as "the answer to everything" without reservation or qualification.

    Depending how deeply the reader wishes to explore Hadoop, the author provides clear understandable examples for Pig, Hive, HBase, ZooKeeper, and Sqoop.

    Sometimes the footnotes and external references led me to discover additional information. One such book is, "Data-Intensive Text Processing with MapReduce" by Jimmy Lin and Chris Dyer, University of Maryland, College Park. I recommend reading that book too.

    As a project manager and still occasional programmer, I found this ebook topic to be extremely informative, easy to read. As more companies embrace "Big Data" it is important to understand how technical solutions like Hadoop fit in for top consideration. This book helps you one of the deciders.

    (3 of 4 customers found this review helpful)

     
    5.0

    Best hadoop book

    By modernmarvel

    from Newark, NJ

    About Me Developer

    Verified Reviewer

    Pros

    • Concise
    • Easy to understand
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice
      • Student

      Comments about oreilly Hadoop: The Definitive Guide, 2nd Edition:

      This book has very clear description of hadoop, hdfs, pig (the topics i read). I was able to understand every tiny details of those topics from reading this book. I tried some other books, but those were not as clear and informative as this one.

      Displaying reviews 1-3

      Back to top

       
      Buy 2 Get 1 Free Free Shipping Guarantee
      Buying Options
      Immediate Access - Go Digital what's this?