Hadoop, 2nd Edition

Video Training

The standard for large-scale data processing, Hadoop makes your data truly accessible. This Learning Path offers an in-depth tour of the Hadoop ecosystem, providing detailed instruction on setting up and running a Hadoop cluster, batch processing data with Pig, Hive’s SQL dialect, MapReduce, and everything else you need parse, access, and analyze your data.

Below are the video training courses included in this Learning Path.


Learning Apache Hadoop

Presented by Rich Morrow 7 hours 37 minutes

This segment of your Learning Path starts with Hadoop basics, including the Hadoop run modes and job types and Hadoop in the cloud, then moves on to the Hadoop distributed file system (HDFS. You’ll get an introduction to MapReduce, debugging basics, Hive and Pig basics, and Impala fundamentals. As this course concludes, you’ll be able to use these tools and functions to work successfully in Hadoop.


Introduction to Hadoop YARN

Presented by David Yahalom 1 hours 26 minutes

You’ll begin your introduction to Apache Hadoop YARN (Yet Another Resource Negotiator) with a tour of the core Hadoop components, including MapReduce. From there, you’ll dive into the YARN components and architecture. Once you’re familiar with those, you’ll practice scheduling, running, and monitoring applications in YARN, including failure handling, YARN logs, YARN cluster resource allocation, and other essential YARN topics.


Introduction to Apache Hive

Presented by Tom Hanlon 1 hour 42 minutes

Hive is the tool you’ll use to create and query large datasets with SQL in Hadoop. You’ll begin this course by learning how to connect to Hive, then jump into learning how to create tables and load data. From there, you’ll explore more of Hive’s capabilities, learning to manipulate tables with HiveQL, create views and partitions, and transform data with custom scripts. Finally, you’ll learn about Hive execution engines, such as MapReduce, Tez, and Spark.


Hadoop Fundamentals for Data Scientists

Presented by Jenny Kim, Benjamin Bengfort 6 hours 5 minutes

Learn more about the core concepts behind distributed computing and big data as you work with a Hadoop cluster and program analytical jobs. Using higher-level tools such as Hive and Spark, you’ll operationalize analytics over large datasets and rapidly deploy analytical jobs with a variety of toolsets. Once you’ve completed this course, you’ll understand how different parts of Hadoop combine to form an entire data pipeline.


Architectural Considerations for Hadoop Applications

Presented by Mark Grover, Gwen Shapira, Jonathan Seidman, and Ted Malaska 2 hours 31 minutes

Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. Using Clickstream analytics as an end-to-end example, you see how to architect and implement a complete solution with Hadoop.