If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.
Get a high-level overview of HDFS and MapReduce: why they exist and how they work
Plan a Hadoop deployment, from hardware and OS selection to network requirements
Learn setup and configuration details with a list of critical properties
Manage resources by sharing a cluster across multiple groups
Get a runbook of the most common cluster maintenance tasks
Monitor Hadoop clusters—and learn troubleshooting with the help of real-world war stories
Use basic tools and techniques to handle backup and catastrophic failure
Chapter 1 Introduction
Chapter 2 HDFS
Goals and Motivation
Reading and Writing Data
Managing Filesystem Metadata
Namenode High Availability
Access and Integration
Chapter 3 MapReduce
The Stages of MapReduce
Introducing Hadoop MapReduce
Chapter 4 Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop
Operating System Selection and Preparation
Chapter 5 Installation and Configuration
Configuration: An Overview
Environment Variables and Shell Scripts
Namenode High Availability
Chapter 6 Identity, Authentication, and Authorization
Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He's been involved in the open source community and has contributed to a large number of projects over the last decade.
The animal on the cover of Hadoop Operations is a spotted cavy, or lowland paca. The large rodent goes by different names depending on where it lives: tepezcuintle in Mexico and Central America, pisquinte in Costa Rica, jaleb in the Yucatán peninsula, conejo pintado in Panamá, guanta in Ecuador, and so on. The name comes from the now extinct Tupian language of Brazil, meaning "awaken” and “alert.”
The paca has coarse fur and strong legs, at the end of which are four digits in the front and five on the back; pacas use their nails as hooves. Usually weighing in about 13 to 26 pounds, the paca usually has two litters per year.
Overall, this rodent keeps to itself, often described as a quiet, solitary nocturnal animal. They live in burrows that they dig themselves, about seven feet into the ground. Pacas prefer to live near water, which is where they tend to run for escape when threatened. Living in the tropical Americas means a diet of fruit such as avocado and mango as well as leaves, stems, roots, and seeds. These animals are great climbers and gather their own fruit. Considered a pest for farmers harvesting yam, sugar cane, corn, and cassava, the lowland paca are hunted for their delicious meat in Belize.
The cover image is from Shaw’s Zoology. The cover font is Adobe ITC Garamond. The text font is Linotype Birka; the heading font is Adobe Myriad Condensed; and the code font is LucasFont’s TheSansMonoCondensed.
Used to get more information on Hadoop architecture and technology behind ist. Great Details on Installation if you want to install cloudera Hadoop distribution. Some typing errors in the ebook. Early release will stop reading flow because of missing parts.
Bottom Line No, I would not recommend this to a friend