Optimizing Hadoop for MapReduce
By Khaled Tannir
Publisher: Packt Publishing
Final Release Date: February 2014
Pages: 120

In Detail

MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation.

This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster’s node resources to run MapReduce jobs optimally.

This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster’s node resources.

Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression.

The book ends with best practices and recommendations on how to use your Hadoop cluster optimally.

Approach

This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance.

Who this book is for

If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

Product Details
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyOptimizing Hadoop for MapReduce
 
4.0

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (0)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Sort by

Displaying reviews 1-2

Back to top

 
4.0

Optimizing Hadoop for Map Reduce

By Umashankar

from Hyderabad

About Me Designer, Developer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Production Knowledge
  • Production Scale Knowledg
  • Well-written

Cons

    Best Uses

    • Expert
    • Intermediate

    Comments about oreilly Optimizing Hadoop for MapReduce:

    Optimizing Hadoop for MapReduce
    If you are looking for something related to production scale, this is the book that you should have in your library.For detailed review of this book.
    http://j2eedev.org/optimizing-hadoop-map-reducebook-review/

    (1 of 1 customers found this review helpful)

     
    4.0

    Good resource for Hadoop Devops

    By PavanKN

    from India

    About Me Developer, Sys Admin

    Pros

    • Concise
    • Easy to understand
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate

      Comments about oreilly Optimizing Hadoop for MapReduce:

      I had a chance to review another book titled "Optimizing Hadoop for MapReduce" and must say this book is an good resource for devops professionals who build MapReduce programs in Hadoop. The book is well organized — starts off with introducing basic concepts, identifying system bottlenecks and resource weaknesses, suggesting ways to fix and optimize them, followed by Hadoop best practices and recommendations. Though packed with advanced concepts and information on Hadoop architecture, the author writing is such that it could appeal to all types of audience (from novice to expert) with helpful hints on each chapter.

      The first chapter on map reduce is written for people who are new to this paradigm. It contains pictorial representations on how the "low-level" MapReduce works. It's easier to misunderstand the low-level MapReduce process and this chapter will clarify that.

      The second chapter discusses performance tuning parameters — allocating map/reduce tasks based on number of cores in the respective Hadoop cluster. It also suggests widely used cluster management tools such as Ambari, Chukwa, etc.

      The third and fourth chapter discusses identifying system bottlenecks and resource weaknesses respectively. The author takes an organized approach by introducing performance tuning process cycle and demystifying how various major components of a given Hadoop cluster (CPU, RAM, Storage and network bandwidth) could cause a bottleneck and how to eliminate them. Especially in the fourth chapter, I particularly liked the idea of discussing formulas that could be used as part of planning the Hadoop cluster and demonstrated using examples.

      The remaining three chapters focus on enhancing and optimizing the Map/Reduce tasks and best practices and recommendations. The author introduces performance metrics for Map/Reduce tasks and suggests ways to enhance the map/reduce tasks and fine-tuning parameters to improve performance of a MapReduce job. The final chapter on Best practices is packed with valuable information on hardware tuning for optimal performance of the Hadoop cluster and Hadoop best practices.

      Few minor points here and there should be read with caution. For instance, the author says each slave is called a task tracker in the first chapter — could have been better by saying it assumes the responsibilities of task tracker while in general it is actually called a data node. That is just my suggestion. In short, this book is a compilation of all the MapReduce performance related issues and ideas on troubleshooting and optimizing the performance of the same including best practices. Must have book especially for hadoop administrators and developers.

      Displaying reviews 1-2

      Back to top

       
      Buy 2 Get 1 Free Free Shipping Guarantee
      Buying Options
      Immediate Access - Go Digital what's this?
      Ebook: $20.99
      Formats:  ePub, Mobi, PDF