Architect and Build Big Data Applications

Architect and Build Big Data Applications

Video Training

With datasets growing increasingly large, the need for custom data solutions has soared as well. This Learning Path will take you through the entire process of designing and building data applications that can visualize, navigate, and interpret reams of data. Get a thorough introduction to the most important tools in the big data ecosystem.

Below are the video training courses included in this Learning Path.


Introduction to Big Data

Presented by Vladimir Bacvanski 2 hours 58 minutes

Start your exploration of big data, Hadoop, NoSQL, and related technologies here. You’ll learn what big data is and how to process it with MapReduce and Hadoop, including several ways to program big data applications. You’ll also cover NoSQL stores and their best uses and then conclude with NoSQL in the enterprise.


Learning Apache Cassandra

Presented by Ruth Stryker 8 hours 6 minutes

Apache Cassandra is a distributed database management system for handling large amounts of data across many commodity servers. Get a solid understanding of Cassandra as you learn to use it for your own development projects. Begin with the basics of installing and communicating with Cassandra, then learn to create an application, work with clusters, and more.


Introduction to Apache Kafka

Presented by Gwen Shapira 2 hours 55 minutes

Currently one of the hottest projects across the Hadoop ecosystem, Apache Kafka is a distributed, real-time data system that functions in a manner similar to a pub/sub messaging service, but with better throughput, built-in partitioning, replication, and fault tolerance. In this course, you’ll learn how to integrate Kafka into a data processing pipeline and become familiar with the entire Kafka ecosystem.


Introduction to Apache Spark

Presented by Paco Nathan 4 hours 46 minutes

Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. This course teaches you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan.


Building Big Data Platforms

Presented by O'Reilly Media, Inc. 5 hours 45 minutes

What kinds of platforms have Netflix, LinkedIn, CERN, and PayPal constructed to handle big data operations unique to their businesses? And how can you apply some of these solutions to your own business? This course presents case studies from a variety of organizations that will be helpful as you build your own big data infrastructure.


Architectural Considerations for Hadoop Applications

Presented by Mark Grover, Gwen Shapira, Jonathan Seidman, and Ted Malaska 2 hours 31 minutes

Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. Using Clickstream analytics as an end-to-end example, you see how to architect and implement a complete solution with Hadoop.


An Introduction to Time Series with Team Apache

Presented by Patrick McFadin 3 hours 50 minutes

As it becomes easier to create data, we’re faced with the need to collect and analyze at scales never seen before. Learn how to solve time-series data problems with technologies from Team Apache: Kafka, Spark and Cassandra. Using these technologies, you’ll work with an example weather collection network and the challenges it can produce.