Books & Videos

Table of Contents

  1. Introduction: Theory and Tools

    1. Chapter 1 Hadoop Basics

      1. Chimpanzee and Elephant Start a Business
      2. Map-Only Jobs: Process Records Individually
      3. Pig Latin Map-Only Job
      4. Setting Up a Docker Hadoop Cluster
      5. Wrapping Up
    2. Chapter 2 MapReduce

      1. Chimpanzee and Elephant Save Christmas
      2. Pygmy Elephants Carry Each Toy Form to the Appropriate Workbench
      3. Example: Reindeer Games
      4. Hadoop Versus Traditional Databases
      5. The MapReduce Haiku
      6. Wrapping Up
    3. Chapter 3 A Quick Look into Baseball

      1. The Data
      2. Acronyms and Terminology
      3. The Rules and Goals
      4. Performance Metrics
      5. Wrapping Up
    4. Chapter 4 Introduction to Pig

      1. Pig Helps Hadoop Work with Tables, Not Records
      2. Fundamental Data Operations
      3. LOAD Locates and Describes Your Data
      4. STORE Writes Data to Disk
      5. Development Aid Commands
      6. Pig Functions
      7. Piggybank
      8. Apache DataFu
      9. Wrapping Up
  2. Tactics: Analytic Patterns

    1. Chapter 5 Map-Only Operations

      1. Pattern in Use
      2. Eliminating Data
      3. Selecting Records That Satisfy a Condition: FILTER and Friends
      4. Project Only Chosen Columns by Name
      5. Transforming Records
      6. Operations That Break One Table into Many
      7. Operations That Treat the Union of Several Tables as One
      8. Wrapping Up
    2. Chapter 6 Grouping Operations

      1. Grouping Records into a Bag by Key
      2. Group and Aggregate
      3. Calculating the Distribution of Numeric Values with a Histogram
      4. The Summing Trick
      5. Wrapping Up
      6. References
    3. Chapter 7 Joining Tables

      1. Matching Records Between Tables (Inner Join)
      2. How a Join Works
      3. Enumerating a Many-to-Many Relationship
      4. Joining a Table with Itself (Self-Join)
      5. Joining Records Without Discarding Nonmatches (Outer Join)
      6. Selecting Only Records That Lack a Match in Another Table (Anti-Join)
      7. Selecting Only Records That Possess a Match in Another Table (Semi-Join)
      8. Wrapping Up
    4. Chapter 8 Ordering Operations

      1. Preparing Career Epochs
      2. Sorting All Records in Total Order
      3. Sorting Records Within a Group
      4. Numbering Records in Rank Order
      5. Wrapping Up
    5. Chapter 9 Duplicate and Unique Records

      1. Handling Duplicates
      2. Set Operations
      3. Wrapping Up