Books & Videos

Table of Contents

  1. Architectural Considerations for Hadoop Applications

    1. Chapter 1 Data Modeling in Hadoop

      1. Data Storage Options
      2. HDFS Schema Design
      3. HBase Schema Design
      4. Managing Metadata
      5. Conclusion
    2. Chapter 2 Data Movement

      1. Data Ingestion Considerations
      2. Data Ingestion Options
      3. Data Extraction
      4. Conclusion
    3. Chapter 3 Processing Data in Hadoop

      1. MapReduce
      2. Spark
      3. Abstractions
      4. Crunch
      5. Cascading
      6. Hive
      7. Impala
      8. Conclusion
    4. Chapter 4 Common Hadoop Processing Patterns

      1. Pattern: Removing Duplicate Records by Primary Key
      2. Pattern: Windowing Analysis
      3. Pattern: Time Series Modifications
      4. Conclusion
    5. Chapter 5 Graph Processing on Hadoop

      1. What Is a Graph?
      2. What Is Graph Processing?
      3. How Do You Process a Graph in a Distributed System?
      4. Giraph
      5. GraphX
      6. Which Tool to Use?
      7. Conclusion
    6. Chapter 6 Orchestration

      1. Why We Need Workflow Orchestration
      2. The Limits of Scripting
      3. The Enterprise Job Scheduler and Hadoop
      4. Orchestration Frameworks in the Hadoop Ecosystem
      5. Oozie Terminology
      6. Oozie Overview
      7. Oozie Workflow
      8. Workflow Patterns
      9. Parameterizing Workflows
      10. Classpath Definition
      11. Scheduling Patterns
      12. Executing Workflows
      13. Conclusion
    7. Chapter 7 Near-Real-Time Processing with Hadoop

      1. Stream Processing
      2. Apache Storm
      3. Trident
      4. Spark Streaming
      5. Flume Interceptors
      6. Which Tool to Use?
      7. Conclusion
  2. Case Studies

    1. Chapter 8 Clickstream Analysis

      1. Defining the Use Case
      2. Using Hadoop for Clickstream Analysis
      3. Design Overview
      4. Storage
      5. Ingestion
      6. Processing
      7. Analyzing
      8. Orchestration
      9. Conclusion
    2. Chapter 9 Fraud Detection

      1. Continuous Improvement
      2. Taking Action
      3. Architectural Requirements of Fraud Detection Systems
      4. Introducing Our Use Case
      5. High-Level Design
      6. Client Architecture
      7. Profile Storage and Retrieval
      8. Ingest
      9. Near-Real-Time and Exploratory Analytics
      10. Near-Real-Time Processing
      11. Exploratory Analytics
      12. What About Other Architectures?
      13. Conclusion
    3. Chapter 10 Data Warehouse

      1. Using Hadoop for Data Warehousing
      2. Defining the Use Case
      3. OLTP Schema
      4. Data Warehouse: Introduction and Terminology
      5. Data Warehousing with Hadoop
      6. High-Level Design
      7. Conclusion
    4. Appendix Joins in Impala

      1. Broadcast Joins
      2. Partitioned Hash Join