To help you answer big data questions, this unique guide shows you how to use simple, fun, and elegant tools leveraging Apache Hadoop. You’ll learn how to break problems into efficient data transformations to meet most of your analysis needs. Its developer-friendly approach works well for anyone using Hadoop, and flattens the learning curve for those working with big data for the first time.
Written by Philip Kromer, founder and CTO at Infochimps, this book uses real data and real problems to illustrate patterns found across knowledge domains. It equips you with a fundamental toolkit for performing statistical summaries, text mining, spatial and time-series analysis, and light machine learning. For those working in an elastic cloud environment, you’ll learn superpowers that make exploratory analytics especially efficient.
Learn from detailed example programs that apply Hadoop to interesting problems in context
Gain advice and best practices for efficient software development
Discover how to think at scale by understanding how data must flow through the cluster to effect transformations
Identify the tuning knobs that matter, and rules-of-thumb to know when they're needed
Flip is the founder and CTO at Infochimps.com, a big data platform that makes acquiring, storing and analyzing massive data streams transformatively easier. He enjoys Bowling, Scrabble, working on old cars or new wood, and rooting for the Red Sox.