Architecting Big Data Applications

A Beginner’s Guide to Architecting Big Data Applications

Video Training

Whether you’re a data engineer who needs to plan and implement a big data pipeline or a manager interested in learning how tools in the Hadoop technology stack address business goals, these videos will walk you through how to plan your big data solution. You’ll receive an introduction to the concepts of Apache Hadoop, and training on key components including Apache HBase, YARN, Cassandra, Kafka, and Spark.

Below are the video training courses included in this Learning Path.


Introduction to the Hadoop Technology Stack

Presented by Justin Watkins 1 hour 30 minutes

This video introduces the Hadoop technology stack, including Hadoop, Spark, and MapReduce. As well as tools like Sqoop, Flume, Pig, Hive, HCatalog, and Apache Storm.


Introduction To Hadoop YARN

Presented by David Yahalom 1 hour 26 minutes

This video will teach you everything you need to know about YARN. You’ll learn how to schedule, run, and monitor applications in YARN, including running jobs in YARN, failure handling, YARN logs, and YARN cluster resource allocation.


Introduction to Apache HBase Operations

Presented by Jonathan Hsieh 3 hours 44 minutes

This is a complete overview of Apache HBase operations. You’ll learn how to install, administrate, tune and troubleshoot HBase deployments.


Learning Apache Cassandra

Presented by Ruth Stryker 8 hours 6 minutes

Learn everything you need to know about Cassandra to use it for your own development projects.You’ll learn how to install it, how to create a database and a table, and how to insert and model data. You’ll learn how to create an application, update and delete data, select hardware, and add nodes to a cluster. You will also learn how to monitor a cluster, repair and remove nodes, and redefine a cluster. Working files are included, allowing you to follow along with the author throughout the lessons.


Introduction to Apache Kafka

Presented by Gwen Shapira 2 hours 55 minutes

You’ll start with Kafka basics, walk through code examples of Kafka producers and consumers, and then learn how to integrate Kafka with Hadoop. By the end of this video, you’ll be ready to use Kafka for large-scale log collection and stream processing.


Introduction to Apache Spark

Presented by Paco Nathan 4 hours 46 minutes

This video workshop shows you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan, host of the popular Just Enough Math video workshop.


Introduction to Alluxio

Presented by Calvin Jia 49 minutes

If you want to improve the performance of your workloads, develop applications with Alluxio, or deploy and manage Alluxio clusters, this hands-on overview of Alluxio will show you how to set up your own deployment (locally and in a cluster) using a compute framework on top of Alluxio, and connect it to multiple persistent data stores while preserving one namespace.