Video description
Spark is one of today’s most popular distributed computation engines for processing and analyzing big data. This course provides data engineers, data scientist and data analysts interested in exploring the technology of data streaming with practical experience in using Spark. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, the Tungsten execution engine, and more in this hands-on course where you’ll build small several applications that leverage all the aspects of Spark 2.0. While not a requirement, the course works best for those with some Scala experience.
- Understand the main features of Spark and its advantages over existing systems
- Learn the basics of parallelism, streaming computation, and Spark streaming
- Explore the distinctions between Spark Structured Streaming and legacy DStream APIs
- Understand how to write to and use the Spark Structured Streaming API
- Learn about the new Catalyst query optimizer and the Tungsten execution engine
- Discover how Scala and Spark Structured Streaming simplify distributed streaming tasks
- Gain hands-on experience building applications using Spark 2.0
Michael Li is the founder of The Data Incubator, which provides big data corporate training and a selective eight-week fellowship for PhDs transitioning into industry. Previously, he worked as a data scientist, software engineer, and researcher at Foursquare, Google, Andreessen Horowitz, J.P. Morgan, and NASA. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review. Michael earned his Ph.D. at Princeton and was a Marshall Scholar in Cambridge.
Table of contents
- Overview
- Spark Datasets and Structured Streaming
-
Spark Structured Streaming
- Spark Structured Streaming
- Netcat Socket Structured Streaming Example
- Socket Structured Streaming Example
- Spark Structured Streaming Parsing Data
- Constructing Columns in Structured Streaming
- Selecting and Filtering Columns Using Structured Streaming
- GroupBy and Aggregation in Structured Streaming
- Joining Structured Stream with Datasets
- SQL Queries in Spark Structured Streaming
-
DStream Comparison
- Comparing Structured Streaming with DStream
- Custom Receivers in Spark DStream
- Iterative Wordcount Using Spark DStream
- Cumulative Wordcount using Spark DStream
- Benefits of Spark Tungsten
- Tungsten Performance Benefit Demonstration
- Benefits of Spark Catalyst
- Viewing Query Plans in Spark Shell
- Visualizing Query Stages in Spark UI Viewer
- Viewing Spark Catalyst-Optimized Physical Plans
- Standalone Spark Streaming Applications
Product information
- Title: Mastering Spark for Structured Streaming
- Author(s):
- Release date: November 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491974438
You might also like
video
Spark in Motion
See it. Do it. Learn it! Spark in Motion teaches you to use Spark for big …
book
Scala and Spark for Big Data Analytics
Harness the power of Scala to program Spark and analyze tonnes of data in the blink …
video
Using Flume: Integrating Flume with Hadoop, HBase and Spark
In this webcast, Hari Shreedharan, the author of Using Flume will discuss how to use Flume …
video
Apache Spark with Scala – Hands-On with Big Data!
“Big data” analysis is a hot and highly valuable skill—and this course will teach you the …