Book description
This book is the perfect introduction to sophisticated concepts in MapReduce and will ensure you have the knowledge to optimize job performance. This is not an academic treatise; it's an example-driven tutorial for the real world.
In Detail
MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation.
This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster's node resources to run MapReduce jobs optimally.
This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster's node resources.
Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression.
The book ends with best practices and recommendations on how to use your Hadoop cluster optimally.
What You Will Learn
- Learn about the factors that affect MapReduce performance
- Utilize the Hadoop MapReduce performance counters to identify resource bottlenecks
- Size your Hadoop cluster's nodes
- Set the number of mappers and reducers correctly
- Optimize mapper and reducer task throughput and code size using compression and Combiners
- Understand the various tuning properties and best practices to optimize clusters
Table of contents
-
Optimizing Hadoop for MapReduce
- Table of Contents
- Optimizing Hadoop for MapReduce
- Credits
- About the Author
- Acknowledgments
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Understanding Hadoop MapReduce
- 2. An Overview of the Hadoop Parameters
- 3. Detecting System Bottlenecks
- 4. Identifying Resource Weaknesses
- 5. Enhancing Map and Reduce Tasks
- 6. Optimizing MapReduce Tasks
- 7. Best Practices and Recommendations
- Index
Product information
- Title: Optimizing Hadoop for MapReduce
- Author(s):
- Release date: February 2014
- Publisher(s): Packt Publishing
- ISBN: 9781783285655
You might also like
book
Hadoop MapReduce v2 Cookbook - Second Edition
Explore the Hadoop MapReduce v2 ecosystem to gain insights from very large datasets In Detail Starting …
book
Pro Hadoop
You learn the ins and outs of MapReduce; how to structure a cluster, design, and implement …
book
Pro Apache Hadoop, Second Edition
Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop the framework of big …
book
Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem
Get Started Fast with Apache Hadoop ® 2, YARN, and Today’s Hadoop Ecosystem With Hadoop 2.x …