Books & Videos

Table of Contents

  1. Introduction to Distributed Computing

    1. Chapter 1 The Age of the Data Product

      1. What Is a Data Product?
      2. Building Data Products at Scale with Hadoop
      3. The Data Science Pipeline and the Hadoop Ecosystem
      4. Conclusion
    2. Chapter 2 An Operating System for Big Data

      1. Basic Concepts
      2. Hadoop Architecture
      3. Working with a Distributed File System
      4. Working with Distributed Computation
      5. Submitting a MapReduce Job to YARN
      6. Conclusion
    3. Chapter 3 A Framework for Python and Hadoop Streaming

      1. Hadoop Streaming
      2. A Framework for MapReduce with Python
      3. Advanced MapReduce
      4. Conclusion
    4. Chapter 4 In-Memory Computing with Spark

      1. Spark Basics
      2. Interactive Spark Using PySpark
      3. Writing Spark Applications
      4. Conclusion
    5. Chapter 5 Distributed Analysis and Patterns

      1. Computing with Keys
      2. Design Patterns
      3. Toward Last-Mile Analytics
      4. Conclusion
  2. Workflows and Tools for Big Data Science

    1. Chapter 6 Data Mining and Warehousing

      1. Structured Data Queries with Hive
      2. HBase
      3. Conclusion
    2. Chapter 7 Data Ingestion

      1. Importing Relational Data with Sqoop
      2. Ingesting Streaming Data with Flume
      3. Conclusion
    3. Chapter 8 Analytics with Higher-Level APIs

      1. Pig
      2. Spark’s Higher-Level APIs
      3. Conclusion
    4. Chapter 9 Machine Learning

      1. Scalable Machine Learning with Spark
      2. Conclusion
    5. Chapter 10 Summary: Doing Distributed Data Science

      1. Data Product Lifecycle
      2. Machine Learning Lifecycle
      3. Conclusion
    6. Appendix Creating a Hadoop Pseudo-Distributed Development Environment

      1. Quick Start
      2. Setting Up Linux
      3. Installing Hadoop
    7. Appendix Installing Hadoop Ecosystem Products

      1. Packaged Hadoop Distributions
      2. Self-Installation of Apache Hadoop Ecosystem Products