Books & Videos

Table of Contents

  1. Chapter 1 What Is Pig?

    1. Pig Latin, a Parallel Data Flow Language

    2. Pig on Hadoop

    3. What Is Pig Useful For?

    4. The Pig Philosophy

    5. Pig’s History

  2. Chapter 2 Installing and Running Pig

    1. Downloading and Installing Pig

    2. Running Pig

    3. Grunt

  3. Chapter 3 Pig’s Data Model

    1. Types

    2. Schemas

  4. Chapter 4 Introduction to Pig Latin

    1. Preliminary Matters

    2. Input and Output

    3. Relational Operations

    4. User-Defined Functions

  5. Chapter 5 Advanced Pig Latin

    1. Advanced Relational Operations

    2. Integrating Pig with Executables and Native Jobs

    3. split and Nonlinear Data Flows

    4. Controlling Execution

    5. Pig Latin Preprocessor

  6. Chapter 6 Developing and Testing Pig Latin Scripts

    1. Development Tools

    2. Testing Your Scripts with PigUnit

  7. Chapter 7 Making Pig Fly

    1. Writing Your Scripts to Perform Well

    2. Writing Your UDFs to Perform

    3. Tuning Pig and Hadoop for Your Job

    4. Using Compression in Intermediate Results

    5. Data Layout Optimization

    6. Map-Side Aggregation

    7. The JAR Cache

    8. Processing Small Jobs Locally

    9. Bloom Filters

    10. Schema Tuple Optimization

    11. Dealing with Failures

  8. Chapter 8 Embedding Pig

    1. Embedding Pig Latin in Scripting Languages

    2. Using the Pig Java APIs

  9. Chapter 9 Writing Evaluation and Filter Functions

    1. Writing an Evaluation Function in Java

    2. The Algebraic Interface

    3. The Accumulator Interface

    4. Writing Filter Functions

    5. Writing Evaluation Functions in Scripting Languages

  10. Chapter 10 Writing Load and Store Functions

    1. Load Functions

    2. Store Functions

    3. Shipping JARs Automatically

    4. Handling Bad Records

  11. Chapter 11 Pig on Tez

    1. What Is Tez?

    2. Running Pig on Tez

    3. Potential Differences When Running on Tez

    4. Pig on Tez Internals

  12. Chapter 12 Pig and Other Members of the Hadoop Community

    1. Pig and Hive

    2. Cascading

    3. Spark

    4. NoSQL Databases

    5. DataFu

    6. Oozie

  13. Chapter 13 Use Cases and Programming Examples

    1. Sparse Tuples

    2. k-Means

    3. intersect and except

    4. Pig at Yahoo!

    5. Pig at Particle News

  14. Appendix Built-in User Defined Functions and PiggyBank

    1. Built-in UDFs

    2. PiggyBank