Books & Videos

Table of Contents

Chapter: Introduction

Building Spark Applications LiveLessons: Introduction

05m 2s

Chapter: Lesson 1: Introduction to the Spark Environment


00m 49s

1.1 Getting the Materials

02m 39s

1.2 A Brief Historical Diversion

07m 17s

1.3 Origins of the Framework

07m 23s

1.4 Why Spark?

19m 12s

1.5 Getting Set Up: Spark and Java

09m 47s

1.6 Getting Set Up: Scientific Python

05m 7s

1.7 Getting Set Up: R Kernel for Jupyter

09m 11s

1.8 Your First PySpark Job

18m 4s

1.9 Introduction to RDDs: Functions, Transformations, and Actions

23m 6s

1.10 MapReduce with Spark: Programming with Key-Value Pairs

17m 15s

Chapter: Lesson 2: Spark Programming APIs


01m 2s

2.1 Introduction to the Spark Programming APIs

10m 53s

2.2 PySpark: Loading and Importing Data

19m 31s

2.3 PySpark: Parsing and Transforming Data

09m 41s

2.4 PySpark: Analyzing Flight Delays

20m 51s

2.5 SparkR: Introduction to DataFrames

20m 33s

2.6 SparkR: Aggregations and Analysis

08m 33s

2.7 SparkR: Visualizing Data with ggplot2

09m 41s

2.8 Why (Spark) SQL?

03m 42s

2.9 Spark SQL: Adding Structure to Your Data

31m 46s

2.10 Spark SQL: Integration into Existing Workflows

04m 42s

Chapter: Lesson 3: Your First Spark Application


01m 10s

3.1 How Spark Fits into the Data Science Process

14m 28s

3.2 Introduction to Exploratory Data Analysis

10m 8s

3.3 Case Study:

17m 40s

3.4 Data Quality Checks with Accumulators

18m 49s

3.5 Making Sense of Data: Summary Statistics and Distributions

14m 50s

3.6 Working with Text: Introduction to NLP

07m 43s

3.7 Tokenization and Vectorization with Spark

17m 52s

3.8 Summarization with tf-idf

20m 16s

3.9 Introduction to Machine Learning

20m 47s

3.10 Unsupervised Learning with Spark: Implementing k-means

24m 4s

3.11 Testing k-means with Essays

09m 14s

3.12 Challenges of k-means: Latent Features, Interpretation, and Validation

21m 37s

Chapter: Lesson 4: Spark Internals


00m 55s

4.1 Introduction to Distributed Systems

15m 55s

4.2 Building Systems That Scale

11m 37s

4.3 The Spark Execution Context

10m 8s

4.4 RDD Deep Dive: Dependencies and Lineage

11m 48s

4.5 A Day in the Life of a Spark Application

14m 1s

4.6 How Code Runs: Stages, Tasks, and the Shuffle

13m 21s

4.7 Spark Deployment: Local and Cluster Modes

20m 50s

4.8 Setting Up Your Own Cluster

22m 35s

4.9 Spark Performance: Monitoring and Optimization

09m 25s

4.10 Tuning Your Spark Application

20m 7s

4.11 Making Spark Fly: Parallelism

07m 34s

4.12 Making Spark Fly: Caching

13m 5s

Chapter: Lesson 5: Advanced Applications


00m 53s

5.1 Machine Learning on Spark: MLlib and

13m 39s

5.2 The KDD Cup Competition: Preparing Data and Imputing Values

22m 43s

5.3 Introduction to Supervised Learning: Logistic Regression

17m 36s

5.4 Building a Model with MLlib

13m 9s

5.5 Model Evaluation and Metrics

14m 41s

5.6 Leveraging scikit-learn to Evaluate MLlib Models

21m 36s

5.7 Training Models with

16m 6s

5.8 Machine Learning Pipelines with

11m 3s

5.9 Tuning Models: Features, Cross Validation, and Grid Search

13m 43s

5.10 Serializing and Deploying Models

08m 22s

Chapter: Summary

Building Spark Applications LiveLessons:Summary

08m 5s