Below are the video training courses included in this Learning Path.
Building Data Pipelines with Python
Presented by Katharine Jarmul3 hours 40 minutes
You’ll learn how to get started using Python for distributed task processing and scale your Python data pipelines with parallel distribution. In this video, we’ll compare many of the more popular task managers, including Celery, Dask, Airflow, Spark and Django Channels. We’ll also cover how to manage scripts and measure performance within your distributed workflow using systems knowledge and tools.
Introduction to PySpark
Presented by Alex Robbins3 hours 21 minutes
You’ll learn everything you need to know about the Spark Python API, from fundamentals to transformations, performance, and running on a cluster. This video also covers advanced topics, including Spark streaming, DataFrames, SQL, and MLlib.