Below are the video training courses included in this Learning Path.
Learning Apache Cassandra
Presented by Ruth Stryker8 hours 6 minutes
Apache Cassandra is a distributed database management system for handling large amounts of data across many commodity servers. Get a solid understanding of Cassandra as you learn to use it for your own development projects. Begin with the basics of installing and communicating with Cassandra, then learn to create an application, work with clusters, and more.
Introduction to Apache Kafka
Presented by Gwen Shapira2 hours 55 minutes
Currently one of the hottest projects across the Hadoop ecosystem, Apache Kafka is a distributed, real-time data system that functions in a manner similar to a pub/sub messaging service, but with better throughput, built-in partitioning, replication, and fault tolerance. In this course, you’ll learn how to integrate Kafka into a data processing pipeline and become familiar with the entire Kafka ecosystem.
Introduction to Apache Spark
Presented by Paco Nathan4 hours 46 minutes
Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. This course teaches you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan.
Large-scale Real-time Stream Processing and Analytics
7 hours 4 minutes
Streaming data enables you to rapidly assess and respond to events, but only if you have the right methods for processing it. This step of your Learning Path includes videos of live sessions from Strata + Hadoop World 2015 in San Jose, California—you’ll learn about several analytics tools and event mining techniques from experts in the field.
An Introduction to Time Series with Team Apache
Presented by Patrick McFadin3 hours 50 minutes
Discover how Apache Cassandra can be a perfect fit for time series data. Then, add in Apache Spark as a perfect analytics companion. As you work through this hands-on course, you’ll build an end-to-end data pipeline to ingest, process and store high speed, time series data.