Mastering Hadoop

by Sandeep Karanth

Released December 2014

Publisher(s): Packt Publishing

ISBN: 9781783983643

Start your free trial

Book description

Go beyond the basics and master the next generation of Hadoop data processing platforms

In Detail

Hadoop is synonymous with Big Data processing. Its simple programming model, "code once and deploy at any scale" paradigm, and an ever-growing ecosystem makes Hadoop an all-encompassing platform for programmers with different levels of expertise.

This book explores the industry guidelines to optimize MapReduce jobs and higher-level abstractions such as Pig and Hive in Hadoop 2.0. Then, it dives deep into Hadoop 2.0 specific features such as YARN and HDFS Federation.

This book is a step-by-step guide that focuses on advanced Hadoop concepts and aims to take your Hadoop knowledge and skill set to the next level. The data processing flow dictates the order of the concepts in each chapter, and each chapter is illustrated with code fragments or schematic diagrams.

What You Will Learn

Understand the changes involved in the process in the move from Hadoop 1.0 to Hadoop 2.0
Customize and optimize MapReduce jobs in Hadoop 2.0
Explore Hadoop I/O and different data formats
Dive into YARN and Storm and use YARN to integrate Storm with Hadoop
Deploy Hadoop on Amazon Elastic MapReduce
Discover HDFS replacements and learn about HDFS Federation
Get to grips with Hadoop's main security aspects
Utilize Mahout and RHadoop for Hadoop analytics