Hadoop: The Definitive Guide, 4th Edition
Storage and Analysis at Internet Scale
Publisher: O'Reilly Media
Final Release Date: March 2015
Pages: 756

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.

  • Learn fundamental components such as MapReduce, HDFS, and YARN
  • Explore MapReduce in depth, including steps for developing applications with it
  • Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN
  • Learn two data formats: Avro for data serialization and Parquet for nested data
  • Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)
  • Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop
  • Learn the HBase distributed database and the ZooKeeper distributed configuration service
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyHadoop: The Definitive Guide, 4th Edition
 
4.8

(based on 4 reviews)

Ratings Distribution

  • 5 Stars

     

    (3)

  • 4 Stars

     

    (1)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Accurate (4)
  • Helpful examples (4)
  • Well-written (4)

Cons

No Cons

Best Uses

  • Novice (4)
  • Intermediate (3)
    • Reviewer Profile:
    • Developer (4)

Reviewed by 4 customers

Displaying reviews 1-4

Back to top

(1 of 2 customers found this review helpful)

 
4.0

Best book for hadoop beginners

By Vilas

from Pune,India

About Me Developer

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Novice

    Comments about oreilly Hadoop: The Definitive Guide, 4th Edition:

    First I started searching for good book to study hadoop and its eco-systems and my search end at Hadoop the definitive guide.It has given all the stuffs from beginners to expert level, especialy the hdfs and mapreduce part are explained very well .Everthing given in this book is really valuable and I promise you reading this book will not be waste of time and money,a big thanks to Tom white

    (1 of 1 customers found this review helpful)

     
    5.0

    Great introduction to Hadoop and its ecosystem

    By Stefan Buenten

    from Cologne, Germany

    About Me Developer, Maker, Sys Admin

    Verified Buyer

    Pros

    • Accurate
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Novice
      • Student

      Comments about oreilly Hadoop: The Definitive Guide, 4th Edition:

      After reading the guide, Hadoop and its ecosystem are as clear to me as ever before. Everything makes sense now. And the read itself was totally painless. Tom White manages to combine general information of Hadoop, MapReduce and the projects of its ecosystem with a solid technical basis and helpful examples. I can only recommend this book for anyone starting to work with Hadoop.

      (1 of 1 customers found this review helpful)

       
      5.0

      Perfect!

      By Stefano Baghino

      from Milan, Italy

      About Me Developer

      Verified Reviewer

      Pros

      • Accurate
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate
        • Novice

        Comments about oreilly Hadoop: The Definitive Guide, 4th Edition:

        This fourth edition of "Hadoop: The Definitive Guide" absolutely keeps up with the expectations I had after reading enthusiastic reviews on the previous editions. I needed a comprehensive guide to quickly gain knowledge about Hadoop and its rich ecosystem and this book really got the job done. Not only that, but the depth with which each topic has been addressed really gave me a good, practical understanding of the dos and don'ts.
        Hadoop itself has been modularized in a healthy way, making it much more than the original, monolithic cluster computing framework; this modularization has been greatly reflected into the book: after dealing with the first few chapters, the reader is free to roam around without having to read it cover to cover, having the chance to learn about this or that piece of the whole system as he sees fit, making it a much more enjoyable and productive learning experience.
        But Hadoop is much more than its core components: the distributed filesystem HDFS, the cluster resource manager YARN and the computing framework MapReduce serve as the basis of a rich ecosystem of cluster-ready libraries and applications that have come to be just as important as the core components themselves, especially as Hadoop vendors tend to bundle this tools together to form an enterprise-ready platform; this is again greatly reflected in this guide: topics like the distributed coordination service ZooKeeper, the BigTable implementation HBase, the in-memory cluster computing framework Spark and many more topics are covered, giving you a good starting point to begin working with these technologies.
        The books ends with some interesting real-world use cases of Hadoop, giving you some really fascinating insight and providing you with some dos and don'ts that can save you precious time when designing solutions on the Hadoop platform.
        I really recommend this book for anyone interested in the topic and even for professionals who already dealt with Hadoop in the past but never really had the chance of understanding how it works.

        (3 of 4 customers found this review helpful)

         
        5.0

        Excellent book on Hadoop

        By Kashif

        from London

        About Me Developer

        Verified Buyer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate
          • Novice
          • Student

          Comments about oreilly Hadoop: The Definitive Guide, 4th Edition:

          I picked this book after getting positive feedback about it on the web forums and I am very happy that I purchased it. It covers Hadoop in good detail and greatly describes other related topics such as Flume, Spark etc.
          I would suggest that we add a chapter about how to design an architecture of Hadoop ecosystems showing how it fits with other Big Data components.

          Displaying reviews 1-4

          Back to top

           
          Buy 2 Get 1 Free Free Shipping Guarantee
          Buying Options
          Immediate Access - Go Digital what's this?
          Ebook:  $42.99
          Formats:  ePub, Mobi, PDF
          Print & Ebook:  $54.99
          Print:  $49.99