Scaling Python for Big Data

Scaling Python for Big Data

Video Training

If you have some Python experience, and you want to take it to the next level, this practical, hands-on Learning Path will be a helpful resource. Video tutorials in this Learning Path will show you how to use Python for distributed task processing, and perform large-scale data processing in Spark using the PySpark API.

Below are the video training courses included in this Learning Path.

1

Building Data Pipelines with Python

Presented by Katharine Jarmul 3 hours 40 minutes

You’ll learn how to get started using Python for distributed task processing and scale your Python data pipelines with parallel distribution. In this video, we’ll compare many of the more popular task managers, including Celery, Dask, Airflow, Spark and Django Channels. We’ll also cover how to manage scripts and measure performance within your distributed workflow using systems knowledge and tools.

2

Introduction to PySpark

Presented by Alex Robbins 3 hours 21 minutes

You’ll learn everything you need to know about the Spark Python API, from fundamentals to transformations, performance, and running on a cluster. This video also covers advanced topics, including Spark streaming, DataFrames, SQL, and MLlib.