Books & Videos

Table of Contents

Chapter: Learn all the buzzwords! And install Hadoop

[Activity] Introduction, and install Hadoop on your desktop!

16m 59s

Hadoop Overview and History

07m 44s

Overview of Hadoop Ecosystem

16m 46s

Tips for Using This Course

01m 26s

Chapter: Using Hadoop's Core: HDFs and MapReduce

HDFS: What it is, and how it works

13m 53s

[Activity] Install the MovieLens dataset into HDFS using the Ambari UI

06m 20s

[Activity] Install the MovieLens dataset into HDFS using the command line

07m 50s

MapReduce: What it is, and how it works

10m 40s

How MapReduce distributes processing

12m 57s

MapReduce example: Break down movie ratings by rating score

11m 35s

[Activity] Installing Python, MRJob, and nano

07m 34s

[Activity] Code up the ratings histogram MapReduce job and run it

07m 36s

[Exercise] Rank Movies by their popularity

07m 6s

[Activity] Check your results against mine!

08m 23s

Chapter: Programming Hadoop with Pig

Introducing Ambari

09m 49s

Introducing Pig

06m 25s

Example: Find the oldest movie with 5-star rating using Pig

15m 7s

[Activity] Find old 5-star movies with Pig

09m 40s

More Pig Latin

07m 34s

[Exercise] Find the most-rated one-star movie

01m 56s

Pig Challenge: Compare Your Results to Mine!

05m 37s

Chapter: Programming Hadoop with Spark

Why Spark?

10m 6s

The Resilient Distributed Datasets(RDD)

10m 14s

[Activity] Find the movie with the lowest average rating - with RDD's

15m 33s

Datasets and Spark 2.0

06m 28s

[Activity] Find the movie with the lowest average rating - with DataFrames

10m 0s

[Activity] Movie recommendations with MLLib

12m 16s

[Exercise] Filter the lowest-rated movies by number of ratings

02m 51s

[Activity] Check your results against mine!

06m 40s

Chapter: Using relational data stores with Hadoop

What is Hive?

06m 31s

[Activity] Use Hive to find the most popular movie

10m 45s

How Hive Works?

09m 10s

[Exercise] Use Hive to find the movie with the highest average rating

01m 55s

Compare your solution to mine

04m 10s

Integrating MySQL with Hadoop

08m 0s

[Activity] Install MySQL and import our movie data

07m 35s

[Activity] Use Sqoop to import data from MySQL to HFDS/Hive

07m 31s

[Activity] Use Sqoop to export data from Hadoop to MySQL

07m 16s

Chapter: Using non-relational data stores with Hadoop

Why NoSQL?

13m 55s

What is HBase

12m 55s

[Activity] Import movie ratings into HBase

13m 28s

[Activity] Use HBase with Pig to import data at scale

11m 19s

Cassandra Overview

14m 50s

[Activity] Installing Cassandra

11m 44s

[Activity] Write Spark output into Cassandra

11m 0s

MongoDB overview

16m 54s

[Activity] Install MongoDB, and integrate Spark with MongoDB

12m 44s

[Activity] Using the MongoDB shell

07m 48s

Choosing a database technology

15m 59s

[Exercise] Choose a database for a given problem

05m 0s

Chapter: Querying Your Data Interactively

Overview of Drill

07m 55s

[Activity] Setting up Drill

11m 19s

[Activity] Querying across multiple databases with Drill

07m 7s

Overview of Phoenix

08m 55s

[Activity] Install Phoenix and query HBase with it

07m 8s

[Activity] Integrate Phoenix with Pig

11m 45s

Overview of Presto

06m 39s

[Activity] Install Presto, and query Hive with it

12m 27s

[Activity] Query both Cassandra and Hive using Presto

09m 1s

Chapter: Managing your Cluster

YARN Explained

10m 1s

Tez explained

04m 56s

[Activity] Use Hive on Tez and measure the performance benefit

08m 35s

Mesos explained

07m 13s

ZooKeeper explained

13m 10s

[Activity] Simulating a failing master with ZooKeeper

06m 47s

Oozie explained

11m 56s

[Activity] Set up a simple Oozie workflow

16m 39s

Zeppelin overview

05m 2s

[Activity] Use Zeppelin to analyze movie ratings, part 1

12m 28s

[Activity] Use Zeppelin to analyze movie ratings, part 2

09m 46s

Hue Overview

08m 8s

Other technologies worth mentioning

04m 35s

Chapter: Feeding Data to your Cluster

Kafka explained

09m 48s

[Activity] Setting up Kafka, and publishing some data

07m 24s

[Activity] Publishing web logs with Kafka

10m 21s

Flume explained

10m 16s

[Activity] Set up Flume and publish logs with it

07m 46s

[Activity] Set up Flume to monitor a directory and store its data in HDFS

09m 12s

Chapter: Analysing Streams of Data

Spark Streaming: Introduction

14m 27s

[Activity] Analyze web logs published with Flume using Spark streaming

14m 20s

[Exercise] Monitor Flume-published logs for errors in real time

02m 2s

Exercise solution: Aggregating HTTP access codes with Spark Streaming

04m 24s

Apache Storm: Introduction

09m 27s

[Activity] Count words with Storm

14m 35s

Flink: An Overview

06m 53s

[Activity] Counting words with Flink

10m 20s

Chapter: Designing Real-World Systems

The Best of the Rest

09m 24s

Review: How the pieces fit together

06m 29s

Understanding your requirements

08m 2s

Sample Application: consume web server logs and keep tracks of top-sellers

10m 6s

Sample application: serving movie recommendations to a website

11m 18s

[Exercise] Design a system to report web sessions per day

02m 52s

Exercise solution: Design a system to count daily sessions

04m 24s

Chapter: Learning More

Books and online resources

05m 32s

Bonus lecture: Discounts on my other big data / data science courses!

02m 25s