Book description
Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig
In Detail
Pig Design Patterns is a comprehensive guide that will enable readers to readily use design patterns that simplify the creation of complex data pipelines in various stages of data management. This book focuses on using Pig in an enterprise context, bridging the gap between theoretical understanding and practical implementation. Each chapter contains a set of design patterns that pose and then solve technical challenges that are relevant to the enterprise use cases.
The book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, in the form of a report or a predictive model. By the end of the book, readers will appreciate Pig's real power in addressing each and every problem encountered when creating an analytics-based data product. Each design pattern comes with a suggested solution, analyzing the trade-offs of implementing the solution in a different way, explaining how the code works, and the results.
What You Will Learn
- Understand Pig's relevance in an enterprise context
- Use Pig in design patterns that enable data movement across platforms during and after analytical processing
- See how Pig can co-exist with other components of the Hadoop ecosystem to create Big Data solutions using design patterns
- Simplify the process of creating complex data pipelines using transformations, aggregations, enrichment, cleansing, filtering, reformatting, lookups, and data type conversions
- Apply knowledge of Pig in design patterns that deal with integration of Hadoop with other systems to enable multi-platform analytics
- Comprehend design patterns and use Pig in cases related to complex analysis of pure structured data
Table of contents
-
Pig Design Patterns
- Table of Contents
- Pig Design Patterns
- Credits
- Foreword
- About the Author
- Acknowledgments
- About the Reviewers
- www.PacktPub.com
- Preface
- 1. Setting the Context for Design Patterns in Pig
-
2. Data Ingest and Egress Patterns
- The context of data ingest and egress
- Types of data in the enterprise
- Ingest and egress patterns for multistructured data
- The ingress and egress patterns for the NoSQL data
- The ingress and egress patterns for structured data
- The ingress and egress patterns for semi-structured data
- JSON ingress and egress patterns
- Summary
- 3. Data Profiling Patterns
-
4. Data Validation and Cleansing Patterns
- Data validation and cleansing for Big Data
- Choosing Pig for validation and cleansing
- The constraint validation and cleansing design pattern
- The regex validation and cleansing design pattern
- The corrupt data validation and cleansing design pattern
- The unstructured text data validation and cleansing design pattern
- Summary
- 5. Data Transformation Patterns
-
6. Understanding Data Reduction Patterns
- Data reduction – a quick introduction
- Data reduction considerations for Big Data
- Dimensionality reduction – the Principal Component Analysis design pattern
- Numerosity reduction – the histogram design pattern
- Numerosity reduction – sampling design pattern
- Numerosity reduction – clustering design pattern
- Summary
- 7. Advanced Patterns and Future Work
- Index
Product information
- Title: Pig Design Patterns
- Author(s):
- Release date: April 2014
- Publisher(s): Packt Publishing
- ISBN: 9781783285556
You might also like
book
Programming Pig
This guide is an ideal learning tool and reference for Apache Pig, the open source engine …
book
Programming Pig, 2nd Edition
For many organizations, Hadoop is the first step for dealing with massive amounts of data. The …
book
Apache Spark 2.x Cookbook
Over 70 recipes to help you use Apache Spark as your single big data computing platform …
book
Hadoop MapReduce v2 Cookbook - Second Edition
Explore the Hadoop MapReduce v2 ecosystem to gain insights from very large datasets In Detail Starting …