Below are the video training courses included in this Learning Path.
Learning Apache Hadoop
Presented by Rich Morrow7 hours 37 minutes
This segment of your Learning Path starts with Hadoop basics, including the Hadoop run modes and job types and Hadoop in the cloud, then moves on to the Hadoop distributed file system (HDFS. You’ll get an introduction to MapReduce, debugging basics, Hive and Pig basics, and Impala fundamentals. As this course concludes, you’ll be able to use these tools and functions to work successfully in Hadoop.
Introduction to Hadoop YARN
Presented by David Yahalom1 hours 26 minutes
You’ll begin your introduction to Apache Hadoop YARN (Yet Another Resource Negotiator) with a tour of the core Hadoop components, including MapReduce. From there, you’ll dive into the YARN components and architecture. Once you’re familiar with those, you’ll practice scheduling, running, and monitoring applications in YARN, including failure handling, YARN logs, YARN cluster resource allocation, and other essential YARN topics.
Introduction to Apache Hive
Presented by Tom Hanlon1 hour 42 minutes
Hive is the tool you’ll use to create and query large datasets with SQL in Hadoop. You’ll begin this course by learning how to connect to Hive, then jump into learning how to create tables and load data. From there, you’ll explore more of Hive’s capabilities, learning to manipulate tables with HiveQL, create views and partitions, and transform data with custom scripts. Finally, you’ll learn about Hive execution engines, such as MapReduce, Tez, and Spark.
Hadoop Fundamentals for Data Scientists
Presented by Jenny Kim, Benjamin Bengfort6 hours 5 minutes
Learn more about the core concepts behind distributed computing and big data as you work with a Hadoop cluster and program analytical jobs. Using higher-level tools such as Hive and Spark, you’ll operationalize analytics over large datasets and rapidly deploy analytical jobs with a variety of toolsets. Once you’ve completed this course, you’ll understand how different parts of Hadoop combine to form an entire data pipeline.
Architectural Considerations for Hadoop Applications
Presented by Mark Grover, Gwen Shapira, Jonathan Seidman, and Ted Malaska2 hours 31 minutes
Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. Using Clickstream analytics as an end-to-end example, you see how to architect and implement a complete solution with Hadoop.