Getting Up and Running with Apache Spark

Getting Up and Running with Apache Spark

Video Training

Spark is a powerful distributed computing engine for big data, and has emerged as a leading tool in the industry with its focus on improving efficiency and usability. Tutorials and sessions in this Learning Path will teach you about Spark 2.0 libraries, tips and tricks for deploying Spark in production and at scale, and how to get up and running with Spark to write your own Spark applications.

Below are the video training courses included in this Learning Path.

1

The State of Spark and Where It Is Going in 2016

Presented by Reynold Xin 39 minutes

In this talk, Reynold Xin outlines three important Spark innovations, how to use them, and their implications for Spark users. These trends include: a tighter integration of streaming systems and existing enterprise data infrastructure, elasticity and cloud computing for enterprise data infrastructure, and the rise of new hardware such as SSDs, GPUs, and 3D XPoint bringing abundant computing resources.

2

Mastering Spark for Structured Streaming

Presented by Tianhui Michael Li 1 hour 47 minutes

This is a hands-on tutorial geared toward data engineers, data scientists, and data analysts. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, and the Tungsten execution engine while building applications that leverage all the aspects of Spark 2.0.

3

Introduction to PySpark

Presented by Alex Robbins 3 hours 21 minutes

This video covers everything you need to know about the Spark Python API. You’ll install Spark, then jump into learning the Spark fundamentals. You’ll master transformations, including filter, pipe, repartition, and distinct, as well as actions, input and output, performance, and running on a cluster. Finally, you will learn advanced topics, including Spark streaming, dataframes and SQL, and MLlib. Working files are included, allowing you to follow along.

4

Spark in Production: Tips and Tricks

Presented by Vida Ha, Holden Karau, Ted Malaska, and Mark Grover 1 hour 58 minutes

This is a three-video compilation from the O’Reilly Strata+Hadoop World 2016 conferences that includes “How to use Apache Spark properly in your big data architecture,” “Beyond shuffling: Tips and tricks for scaling Spark jobs,” and the “Top 5 mistakes when writing Spark applications.”

5

Study Guide for the Developer Certification for Apache Spark

Presented by Olivier Girardot 3 hours 41 minutes

This video will teach you everything you need to know to prepare for and pass the Developer Certification for Apache Spark. This course is designed for users that are already familiar with Python, Java, and Scala.