Written by Ganglia designers and maintainers, this book shows you how to collect and visualize metrics from clusters, grids, and cloud infrastructures at any scale. Want to track CPU utilization from 50,000 hosts every ten seconds? Ganglia is just the tool you need, once you know how its main components work together. This hands-on book helps experienced system administrators take advantage of Ganglia 3.x.
Learn how to extend the base set of metrics you collect, fetch current values, see aggregate views of metrics, and observe time-series trends in your data. You’ll also examine real-world case studies of Ganglia installs that feature challenging monitoring requirements.
Determine whether Ganglia is a good fit for your environment
Learn how Ganglia’s gmond and gmetad daemons build a metric collection overlay
Plan for scalability early in your Ganglia deployment, with valuable tips and advice
Take data visualization to a new level with gweb, Ganglia’s web frontend
Write plugins to extend gmond’s metric-collection capability
Troubleshoot issues you may encounter with a Ganglia installation
Integrate Ganglia with the sFlow and Nagios monitoring systems
Contributors include: Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Peter Phaal, and Daniel Pocock. Case study writers include: John Allspaw, Ramon Bastiaans, Adam Compton, Andrew Dibble, and Jonah Horowitz.
Chapter 1 Introducing Ganglia
It’s a Problem of Scale
Hosts ARE the Monitoring System
Redundancy Breeds Organization
Is Ganglia Right for You?
gmond: Big Bang in a Few Bytes
gmetad: Bringing It All Together
gweb: Next-Generation Data Analysis
But Wait! That’s Not All!
Chapter 2 Installing and Configuring Ganglia
Chapter 3 Scalability
Who Should Be Concerned About Scalability?
gmond and Ganglia Cluster Scalability
gmetad Storage Planning and Scalability
Chapter 4 The Ganglia Web Interface
Navigating the Ganglia Web Interface
The gweb Search Tab
The gweb Views Tab
The gweb Aggregated Graphs Tab
The gweb Compare Hosts Tab
The gweb Events Tab
The gweb Automatic Rotation Tab
The gweb Mobile Tab
Custom Composite Graphs
Authentication and Authorization
Chapter 5 Managing and Extending Metrics
gmond: Metric Gathering Agent
Extending gmond with Modules
Extending gmond with gmetric
How to Choose Between C/C++, Python, and gmetric
Java and gmetric4j
Real World: GPU Monitoring with the NVML Module
Chapter 6 Troubleshooting Ganglia
Monitoring the Monitoring System
General Troubleshooting Mechanisms and Tools
Common Deployment Issues
Typical Problems and Troubleshooting Procedures
Chapter 7 Ganglia and Nagios
Sending Nagios Data to Ganglia
Monitoring Ganglia Metrics with Nagios
Displaying Ganglia Data in the Nagios UI
Monitoring Ganglia with Nagios
Chapter 8 Ganglia and sFlow
Standard sFlow Metrics
Configuring gmond to Receive sFlow
Host sFlow Agent
Using Ganglia with Other sFlow Tools
Chapter 9 Ganglia Case Studies
Reuters Financial Software
Lumicall (Mobile VoIP on Android)
Wait, How Many Metrics? Monitoring at Quantcast
Many Tools in the Toolbox: Monitoring at Etsy
Appendix Advanced Metric Configuration and Debugging
Module Metric Definitions
Advanced Metrics Aggregation and You
Debugging with gmond-debug
Appendix Ganglia and Hadoop/HBase
Introducing Hadoop and HBase
Configuring Hadoop and HBase to Publish Metrics to Ganglia
Matt Massie open-sourced Ganglia in 2000 while working as a Staff Researcher at the University of California, Berkeley. He designed ganglia to monitor a shared computational grid of clusters distributed across the United States for scientific research. In 2010, he contributed a chapter on cluster monitoring for the O'Reilly book "Web Operations: Keeping the Data On Time" by John Allspaw and Jesse Robbins. Matt is currently a software engineer at Cloudera focused on Apache Hadoop enterprise management and monitoring.
Bernard Li is a High Performance Computing (HPC) Systems Engineer at Lawrence Berkeley National Laboratory. He is currently one of the maintainers of the Ganglia project. He has been involved with HPC since 2003 and has worked on Open Source projects such as OSCAR, SystemImager and Warewulf.
Brad Nicholes is a member of the Apache Software Foundation and is currently working as a Consultant Software Engineer for NetIQ. In addition to being a committer on the Apache HTTPD and APR projects, Brad is also a developer as well as one of the administrators of the Ganglia project. As a developer on the Ganglia project, Brad developed and introduced the C/C++ and Python metric module interface into Gangla 3.1.x. He also developed and contributed several of the initial metric modules that currently ship with Ganglia. Brad attended school at the University of Utah and Brigham Young University and holds a degree in Computer Science.
Vladimir Vuksan (Broadcom) has worked in technical operations, systems engineering and software development for over 15 years. Prior to Broadcom he has worked at Mocospace, Rave Mobile Safety, Demandware, University of New Mexico implementing high availability solutions and building tools to make managing and running infrastructure easier.
The animal on the cover of Monitoring with Ganglia is the Porpita pacifica, which is found in the tropical Pacific. P. pacifica, commonly called the sea money or blue button, is a blue-fringed disc about 1.5 inches in diameter. Its delicate tentacles are sticky and extend from chambers in the gas-filled disc; the tentacles are usually damaged in the surf and reportedly deliver a sting that is not powerful but may cause irritation to human skin.
The blue button lives on the surface of the sea and consists of two main parts: the float and the hydroid colony. The hard golden-brown float is round, almost flat, and about 1 inch wide. The hydroid colony, which can range from bright blue turquoise to yellow, resembles tentacles like those of the jellyfish. Each strand has numerous branchlets, each of which ends in knobs of stinging cells called nematocysts.
In the food web, its size makes it easy prey for several organisms. The blue button itself is a passive drifter, meaning that it feeds on both living and dead organisms that come in contact with it. It competes with other drifters for food and mainly feeds on small fish, eggs, and zooplankton. The blue button has a single mouth located beneath the float, which is used for both the intake of nutrients and the expulsion of wastes. This species reproduces by releasing tiny medusa, which go on to develop new colonies.
The cover image is from Beauties and Wonders of Land and Sea. The cover font is Adobe ITC Garamond. The text font is Linotype Birka; the heading font is Adobe Myriad Condensed; and the code font is LucasFont’s TheSansMonoCondensed.
Ganglia is the most robust and scalable tool for performance monitor I've tried or heard of.
This book, written by some of the top contributor of the project, is an awesome guide to Ganglia.
Due to its organization and the authors writing style, the book is easy to understand and can be read as a "full-guide" reading it from the first page to the last one, or a reference book reading only the parts that are relevant to you in that specific moment.
Due to the software nature and history, the authors always speak about scalability and big-number-optimization, leaving the reader that is looking for a simple one-node configuration a little bit confused. The confusion, as far as I can tell, disappears pretty soon. This is the only downside I found in the book: there is a lot of information about big environments but close to none information specific for small or single-node environments.
The two chapters I liked the most are the Chapter 7 (Ganglia and Nagios) and the Chapter 9 (Ganglia Case Studies). The Chapter 7, as the title makes easy to guess, suggests how to make Ganglia and Nagios to pass information one to the other. I believe this is very important since I usually find that both tools are not perfect, but they are (close to be) perfectly complementary. Chapter 9 brings up 6 real-world cases and a lot of information that are really useful if you plan to create an environment somehow similar to one (ore more) of the presented cases.
I would definitely suggest this book to anyone is facing an installation or a configuration of Ganglia or to anyone has to monitor multiple systems in an optimized way.
Disclaimer: I received a free electronic copy of this book as part of the O'Reilly Blogger Program
Bottom Line Yes, I would recommend this to a friend