Video description
Get up to speed on Apache Spark for building big data applications in Python, Java, or Scala. Recently updated with nearly an hour of new footage on DataFrames in Spark 1.3, this video workshop shows you how to explore data and apply algorithms with MLlib, GraphX, and Spark SQL. You’ll learn Spark and its core APIs by doing hands-on technical exercises with presenter Paco Nathan, host of the popular Just Enough Math video workshop.
With this workshop, you will:
- Get going with the newest features of Spark 1.3
- Open a Spark shell
- Develop Spark apps for typical use cases
- Use some machine-learning algorithms
- Explore data sets loaded from HDFS or another filesystem
- Work with Spark SQL, Spark Streaming, and Spark’s machine-learning library, MLlib
- Use Maven, SBT, IPython Notebook, and other tooling
- Learn about Spark follow-up courses and certification
Paco Nathan has led innovative data teams building large-scale apps for several years. He’s an expert in distributed systems, machine learning, cloud computing, and functional programming.
Table of contents
- Pre-Flight Check
- Spark Deconstructed
- A Brief History
- Simple Spark Apps
- Spark Essentials
- Spark Examples
- Unifying the Pieces - Spark SQL
- Unifying the Pieces - Spark Streaming
- Unifying the Pieces - MLlib and GraphX
- Unified Workflows Demo
- The Full SDLC
- Developer Certification
- Resources
- Introduction - Why DataFrames?
- ETL to Prepare the Data from Capital Bikeshare
- Create a DataFrame, Explore using SQL
- Data Preparation for Machine Learning Models
- Build a Classifier Using Naive Bayes
- Build a Classifier Using Decision Trees
- Build a Classifier Using Random Forests
- Use a DataFrame to Compare Models
- Parquet as a Best Practice with DataFrames
- How to Store a DataFrame with Parquet
- How to Read a DataFrame Back in From Parquet
- Use SQL to Estimate Route Durations
- Data Preparation for GraphX - Model Route Costs
- Use PageRank to Rank Popular Stations
- Optimize Routes to Columbus Circle
- Compare Results with Google Maps
- Analyze a Popular Tourist Route
- Examples of How to Use DataFrames in Python
- Summary - The New DataFrames Features in Spark
Product information
- Title: Introduction to Apache Spark
- Author(s):
- Release date: March 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491919729
You might also like
video
Debugging Apache Spark
Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely …
video
Apache Spark with Scala – Hands-On with Big Data!
“Big data” analysis is a hot and highly valuable skill—and this course will teach you the …
video
Apache Spark with Java - Learn Spark from a Big Data Guru
This course covers all the fundamentals of Apache Spark with Java and teaches you everything you …
video
Building an End-to-End Batch Data Pipeline with Apache Spark
Explore Big Data architectures and the tools you can leverage to build an end-to-end data platform. …