Using Spark in the Hadoop Ecosystem

Video description

You're new to Big Data, you've heard about Apache Spark and Apache Hadoop and you want to play. Big Data coach Rich Morrow gets you into the game via sixteen sprints (sixteen hands-on labs) across the Spark-Hadoop ball field. First, you'll create playing areas using Amazon Web Services EMR and Cloudera Quickstart VM. Then you'll install Hadoop, run basic HDFS commands, learn MapReduce, use Flume and Sqoop, run Spark and then run Spark again.

You'll play with Spark SQL, learn common MLLib usage, do analysis with Hive, ETL with Pig, and then jog through Hadoop/Cloud use cases, Hbase basics, and enterprise integration. When practice is over, you'll know Spark, it's associated modules, the Hadoop ecosystem, and the when, where, how, and why each technology is used. Working files are included, allowing you to follow along with the author throughout the lessons. Play on.

  • Understand Apache Spark and why it's Big Data's fastest growing open source project
  • Learn what Apache Hadoop is and how it's used in the world of Big Data
  • Master the basics of Hadoop - HDFS, YARN and MapReduce
  • Master the basics of Spark- Spark SQL, MLlib, Spark Streaming, Graphx and more
  • Discover Sqoop, Flume, Hive, Pig, HBase, and Oozie - key components of Hadoop
  • Gain direct experience with Spark and Hadoop with sixteen hands-on labs
Rich Morrow is a 20+ year veteran of IT and an expert in big data and cloud technologies. He's used Hadoop and AWS for over 6 years in his consulting practice, quicloud.com, and has taught Cloudera (Hadoop) and AWS for Global Knowledge (where he also serves as Course Director for Cloud and Big Data) for over 4 years. Rich retains all certifications for both AWS and Cloudera, and is a prolific writer and speaker on Cloud, Big Data, DevOps/Agile, Mobile, and IoT topics, including the O'Reilly titles Hands-on with Amazon Redshift, Learning Apache Hadoop and Cloud Computing With AWS.

Table of contents

  1. Introduction
    1. Course Introduction 00:04:21
    2. About The Author 00:04:14
    3. What Is Big Data 00:11:07
    4. Historical Approaches 00:07:04
    5. Modern-Day Approach 00:12:42
    6. What Is Hadoop 00:11:05
    7. Hadoop Core Vs Ecosystem 00:05:03
    8. Hadoopable Problems 00:06:37
  2. Hadoop Basics
    1. HDFS And Yarn 00:08:14
    2. Hive And Pig Interface Introduction 00:05:59
    3. Introduction To Spark 00:04:37
    4. Hadoop In The Cloud (Amazon Web Services Intro) 00:08:49
    5. Installing Hadoop Into EMR Part - 1 00:15:31
    6. Installing Hadoop Into EMR Part - 2 00:15:34
    7. Installing Cloudera Quickstart VM 00:11:01
    8. Web GUIs 00:11:06
  3. Hadoop Distributed Filesystem (HDFS)
    1. HDFS Architecture 00:10:05
    2. HDFS File Write Walkthrough 00:17:57
    3. Secondary Name Node 00:06:38
    4. Basic HDFS Commands 00:09:23
    5. Using HDFS Commands Part - 1 00:07:34
    6. Using HDFS Commands Part - 2 00:09:27
    7. HA And Federation Basics 00:12:48
    8. HDFS Access Controls (Or Lack Thereof) 00:09:34
  4. Yarn
    1. Yarn Purpose 00:06:16
    2. Yarn Architecture 00:07:25
    3. Yarn With Spark 00:06:44
  5. MapReduce
    1. MapReduce Explained 00:11:52
    2. MapReduce Architecture 00:07:36
    3. MapReduce Code Walkthrough 00:11:59
    4. MapReduce Details Walkthrough 00:04:45
    5. Running MapReduce Job 00:08:59
  6. HDFS Data Import And Export
    1. Import/Export Options 00:11:12
    2. Flume Introduction 00:10:53
    3. Using Flume 00:13:43
    4. Sqoop Introduction 00:09:25
    5. Using Sqoop 00:17:01
    6. HDFS Interaction Tools 00:06:01
    7. Oozie Introduction 00:10:17
  7. Spark Basics
    1. Spark Value Propositions 00:08:30
    2. Spark Run Modes (Yarn, Standalone, Mesos) 00:07:33
    3. RDDs And Dataframes 00:17:24
    4. Hands On Spark Part - 1 00:08:12
    5. Hands On Spark Part - 2 00:10:38
    6. Running Spark Part - 1 00:09:58
    7. Running Spark Part - 2 00:13:55
    8. Optimizing And Debugging Spark 00:18:17
    9. Spark Libraries Overview 00:09:05
  8. Spark Built-In Libraries
    1. Spark SQL 00:09:01
    2. Spark SQL Usage 00:12:02
    3. MLlib Basics 00:15:30
    4. Common MLlib Usage Part - 1 00:15:02
    5. Common MLlib Usage Part - 2 00:08:23
    6. Spark Streaming 00:12:43
    7. GraphX 00:09:58
  9. Hive And Pig
    1. Hive Vs Pig 00:09:53
    2. Hive Basics 00:11:53
    3. Analysis With Hive 00:10:54
    4. Pig Basics 00:14:38
    5. ETL And Analytics With Pig 00:20:16
  10. Hadoop In The Cloud
    1. Hadoop/Cloud Use Cases 00:05:16
    2. Elastic MapReduce (EMR) 00:12:47
  11. Ecosystem
    1. HBase Basics 00:11:16
    2. Enterprise Integration 00:10:39
  12. Wrap Up
    1. Wrap Up 00:03:41

Product information

  • Title: Using Spark in the Hadoop Ecosystem
  • Author(s): Rich Morrow
  • Release date: June 2016
  • Publisher(s): Infinite Skills
  • ISBN: 9781771375658