How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems.
Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub.
Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
Dive into key Flume components, including sources that accept data and sinks that write and deliver it
Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
Explore APIs for sending data to Flume agents from your own applications
Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running
Chapter 1Apache Hadoop and Apache HBase: An Introduction
Chapter 2Streaming Data Using Apache Flume
The Need for Flume
Is Flume a Good Fit?
Inside a Flume Agent
Configuring Flume Agents
Getting Flume Agents to Talk to Each Other
Replicating Data to Various Destinations
Flume’s No Data Loss Guarantee, Channels, and Transactions
Agent Failure and Data Loss
The Importance of Batching
What About Duplicates?
Running a Flume Agent
Lifecycle of a Source
Spooling Directory Source
Writing Your Own Sources*
Channels Bundled with Flume
Lifecycle of a Sink
Optimizing the Performance of Sinks
Writing to HDFS: The HDFS Sink
Morphline Solr Sink
Elastic Search Sink
Other Sinks: Null Sink, Rolling File Sink, Logger Sink
Writing Your Own Sink*
Chapter 6Interceptors, Channel Selectors, Sink Groups, and Sink Processors
Sink Groups and Sink Processors
Chapter 7Getting Data into Flume*
Building Flume Events
Flume Client SDK
Chapter 8Planning, Deploying, and Monitoring Flume
Hari Shreedharan is a PMC Member and Committer on the Apache Flume Project. As a PMC member, he is involved in making decisions on the direction of the project. Hari is also a Software Engineer at Cloudera where he works on Apache Flume and Apache Sqoop. He also ensures that customers can successfully deploy and manage Flume and Sqoop on their clusters, by helping them resolve any issues they are facing. Hari completed his Bachelors from Malaviya National Institute of Technology, Jaipur, India and his Masters in Computer Science from Cornell University in 2010.
The animal on the cover of Using Flume is a burbot (Lota lota), a fish of northern waters that is often found in clean, large rivers and deep, cold lakes. Also known as mariah, the lawyer, and eelpout, the burbot is closely related to the marine common ling and the cusk.Burbot are unusual looking, with a head like a catfish, a body like an eel, and very small scales that make it smooth and slimy to the touch. They are marked by a single barbel on their chin (the fish’s name comes from barba, the Latin word for "beard"). They are aggressive predators and primarily fish eaters but, at times, burbot will also eat insects and have been known to eat frogs, snakes, and birds.Burbot are the only freshwater fish to spawn in midwinter. Spawning takes place when water temperatures are between 32º and 40º F, often under ice cover. They are difficult to study, due to their deep habitats and reproduction under ice, but they provide great fishing opportunities for winter anglers. In fact, the town of Walker, Minnesota, holds an International Eelpout Festival every winter on Leech Lake.Many of the animals on O'Reilly covers are endangered; all of them are important to the world. To learn more about how you can help, go to animals.oreilly.com.The cover image is from Meyers Kleines Lexicon. The cover fonts are URW Typewriter and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag's Ubuntu Mono.