A Beginner's Guide to Architecting Big Data Applications

Video description

Whether you’re a data engineer who needs to plan and implement a big data pipeline or a manager interested in learning how tools in the Hadoop technology stack address business goals, these videos will walk you through how to plan your big data solution. You’ll receive an introduction to the concepts of Apache Hadoop, and training on key components including Apache HBase, YARN, Cassandra, Kafka, and Spark.

Table of contents

  1. Introduction
    1. Introduction And Course Overview
    2. About The Author
    3. Getting Started With A Hadoop Installation
  2. What Is Hadoop?
    1. What Is Hadoop?
    2. What Is HDFS? - Scalable Storage
    3. Understanding Block Storage
    4. Block Replication And Resilience
    5. HDFS Architecture - The Name Node And The Data Nodes
    6. Parallel Performance
    7. What Is Yarn? - Scalable Compute
    8. Yarn: Plug-In Processing Engines
    9. Overview Of MapReduce
    10. Using Different Languages
  3. Options For Data Input
    1. Importing Data
    2. The Hadoop Client
    3. Overview Of Sqoop
    4. Overview Of Flume
    5. Other Import Tools
  4. Hadoop Tools
    1. What Is Pig?
    2. What Is Hive?
    3. Comparing Hive To SQL
    4. Hive Architecture
    5. What Is HCatalog?
    6. Hive Interfaces
    7. Apache Storm
    8. Apache Spark
    9. Hadoop Security
    10. Overview Of Oozie
    11. Mahout
    12. HBase And Other Data Stores: Hbase, Accumulo, Etc.
    13. Apache Kafka
    14. Cluster Management
  5. Conclusion
    1. Distributions And Where To Go From Here
    2. Conclusion
  6. Introduction
    1. Course Agenda And Instructor
  7. Core Hadoop Components
    1. Basic Overview Of Hadoop Core Components: HDFS
    2. Hadoop Core Components Overview
    3. What Is Map/Reduce?
  8. YARN: Components And Architecture
    1. Pre-YARN Architecture
    2. YARN Architecture And Daemons
  9. Scheduling, Running And Monitoring Applications In YARN
    1. Running Jobs In YARN
    2. YARN Parameters
    3. YARN Cluster Resource Allocation
    4. Failure Handling
    5. YARN Logs
    6. Hands On With YARN
  10. Conclusion
    1. Summary
  11. Introduction
    1. What Is HBase
    2. What To Expect
    3. About The Author
  12. Administration Basics
    1. HBase Deployment Architecture
    2. HBase Fault Tolerance
    3. Hardware Recommendations
    4. Software Recommendations
    5. HBase Deployment At Scale
    6. Installation With Cloudera Manager
    7. Basic Static Configuration
    8. Rolling Restarts And Upgrades
    9. Interacting With HBase
  13. Troubleshooting
    1. Trouble Shooting Methodology
    2. Trouble Shooting Distributed Clusters
    3. Administration From The Command Line
    4. Using The HBase UI
    5. Using The Metrics
    6. Using The Logs
  14. Tuning
    1. Basic HBase Tuning
    2. Generating Load And Load Test Tool
    3. Generating With YCSB
    4. Region Tuning
    5. Table Storage Tuning
    6. Memory Tuning
    7. Tuning With Failures
    8. Tuning For Modern Hardware
  15. Operations Continuity
    1. Operational Continuity
    2. Corruption: hbck
    3. Corruption: Other Tools
    4. Security
    5. Security Demo
    6. Backups: Snapshots
    7. Backups: Import / Export / Copy Table
    8. Cluster Replication
  16. Ecosystem
    1. HBase Proxy Servers, Thrift And Rest
    2. Hue
    3. HBase With Apache Phoenix
  17. Conclusion
    1. Wrapup And Thank You
  18. Introduction To Cassandra
    1. Introducing The Course
    2. Understanding What Cassandra Is
    3. Learning What Cassandra Is Being Used For
    4. Understanding The System Requirements
    5. Opening The Main Virtual Machine
    6. Pop Quiz - Intro to Cassandra
  19. Getting Started With The Architecture
    1. Understanding That Cassandra Is A Distributed Database
    2. Learning What Snitch Is For
    3. Learning What Gossip Is For
    4. Learning How Data Gets Distributed
    5. Learning About Replication
    6. Learning About Virtual Nodes
    7. Pop Quiz - Getting Started with Architecture
  20. Installing Cassandra
    1. Downloading Cassandra
    2. Ensuring Oracle Java 7 Is Installed
    3. Installing Cassandra
    4. Viewing The Main Configuration File
    5. Providing Cassandra With Permission To Directories
    6. Starting Cassandra
    7. Checking Status
    8. Accessing The Cassandra system.log File
    9. Pop Quiz - Installing Cassandra
  21. Communicating With Cassandra
    1. Understanding Ways To Communicate With Cassandra
    2. Using CQLSH
    3. Pop Quiz - Communicating with Cassandra
  22. Creating A Database
    1. Understanding A Cassandra Database
    2. Defining A Keyspace
    3. Deleting A Keyspace
    4. Pop Quiz - Creating a Database
    5. Lab: Create A Second Database
  23. Creating A Table
    1. Creating A Table
    2. Defining Columns And Data Types
    3. Defining A Primary Key
    4. Recognizing A Partition Key
    5. Specifying A Descending Clustering Order
    6. Pop Quiz - Creating a Table
    7. Lab: Create A Second Table
  24. Inserting Data
    1. Understanding Ways To Write Data
    2. Using The INSERT INTO Command
    3. Using The COPY Command
    4. How Data Is Stored In Cassandra
    5. How Data Is Stored On Disk
    6. Pop Quiz - Inserting Data
    7. Lab: Insert Data
  25. Modeling Data
    1. Understanding Data Modeling In Cassandra
    2. Using A WHERE Clause
    3. Understanding Secondary Indexes
    4. Creating A Secondary Index
    5. Defining A Composite Partition Key
    6. Pop Quiz - Modeling Data
  26. Creating An Application
    1. Understanding Cassandra Drivers
    2. Exploring The DataStax Java Driver
    3. Setting Up A Development Environment
    4. Creating An Application Page
    5. Acquiring The DataStax Java Driver Files
    6. Getting The DataStax Java Driver Files Through Maven
    7. Providing The DataStax Java Driver Files Manually
    8. Connecting To A Cassandra Cluster
    9. Executing A Query
    10. Displaying Query Results - Part 1
    11. Displaying Query Results - Part 2
    12. Using An MVC Pattern
    13. Pop Quiz - Creating an Application
    14. Lab: Create A Second Application - Part 1
    15. Lab: Create A Second Application - Part 2
    16. Lab: Create A Second Application - Part 3
  27. Updating And Deleting Data
    1. Updating Data
    2. Understanding How Updating Works
    3. Deleting Data
    4. Understanding Tombstones
    5. Using TTLs
    6. Updating A TTL
    7. Pop Quiz - Updating and Deleting Data
    8. Lab: Update And Delete Data
  28. Selecting Hardware
    1. Understanding Hardware Choices
    2. Understanding RAM And CPU Recommendations
    3. Selecting Storage
    4. Deploying In The Cloud
    5. Pop Quiz - Selecting Hardware
  29. Adding Nodes To A Cluster
    1. Understanding Cassandra Nodes
    2. Having A Network Connection - Part 1
    3. Having A Network Connection - Part 2
    4. Having A Network Connection - Part 3
    5. Specifying The IP Address Of A Node In Cassandra
    6. Specifying Seed Nodes
    7. Bootstrapping A Node
    8. Cleaning Up A Node
    9. Using cassandra-stress
    10. Pop Quiz - Adding Nodes to a Cluster
    11. Lab: Add A Third Node
  30. Monitoring A Cluster
    1. Understanding Cassandra Monitoring Tools
    2. Using Nodetool
    3. Using JConsole
    4. Learning About OpsCenter
    5. Pop Quiz - Monitoring a Cluster
  31. Repairing Nodes
    1. Understanding Repair
    2. Repairing Nodes
    3. Understanding Consistency - Part 1
    4. Understanding Consistency - Part 2
    5. Understanding Hinted Handoff
    6. Understanding Read Repair
    7. Pop Quiz - Repairing Nodes
    8. Lab: Repair Nodes For A Keyspace
  32. Removing A Node
    1. Understanding Removing A Node
    2. Decommissioning A Node
    3. Putting A Node Back Into Service
    4. Removing A Dead Node
    5. Pop Quiz - Removing a Node
    6. Lab: Put A Node Back Into Service
  33. Redefining A Cluster For Multiple Data Centers
    1. Redefining For Multiple Data Centers - Part 1
    2. Redefining For Multiple Data Centers - Part 2
    3. Changing Snitch Type
    4. Modifying cassandra-rackdc.properties
    5. Changing Replication Strategy - Part 1
    6. Changing Replication Strategy - Part 2
    7. Pop Quiz - Redefining a Cluster
  34. Resources For FurTher Learning
    1. Accessing Documentation
    2. Reading Blogs And Books
    3. Watching Video Recordings
    4. Posting Questions
    5. Attending Events
    6. Wrap Up
    7. The Case for Kafka
    8. The Basics
    9. Setting up a Kafka Cluster
    10. Writing a Kafka Producer
    11. Writing a Kafka Consumer
    12. Using Kafka from Python
    13. Troubleshooting Kafka
    14. Integrating Kafka and Hadoop with Flafka
    15. Kafka Availability and Consistency
    16. Kafka Ecosystem
    17. Future of Kafka
    18. Pre-Flight Check
    19. Spark Deconstructed
    20. A Brief History
    21. Simple Spark Apps
    22. Spark Essentials
    23. Spark Examples
    24. Unifying the Pieces - Spark SQL
    25. Unifying the Pieces - Spark Streaming
    26. Unifying the Pieces - MLlib and GraphX
    27. Unified Workflows Demo
    28. The Full SDLC
    29. Developer Certification
    30. Resources
    31. Introduction - Why DataFrames?
    32. ETL to Prepare the Data from Capital Bikeshare
    33. Create a DataFrame, Explore using SQL
    34. Data Preparation for Machine Learning Models
    35. Build a Classifier Using Naive Bayes
    36. Build a Classifier Using Decision Trees
    37. Build a Classifier Using Random Forests
    38. Use a DataFrame to Compare Models
    39. Parquet as a Best Practice with DataFrames
    40. How to Store a DataFrame with Parquet
    41. How to Read a DataFrame Back in From Parquet
    42. Use SQL to Estimate Route Durations
    43. Data Preparation for GraphX - Model Route Costs
    44. Use PageRank to Rank Popular Stations
    45. Optimize Routes to Columbus Circle
    46. Compare Results with Google Maps
    47. Analyze a Popular Tourist Route
    48. Examples of How to Use DataFrames in Python
    49. Summary - The New DataFrames Features in Spark
  35. Introduction
    1. About Alluxio And The Course
    2. About The Author
  36. Using Alluxio Locally
    1. Downloading Alluxio
    2. Starting The System Locally
    3. Interacting Via The Shell
    4. Browsing The Web UI
  37. Examples With Alluxio
    1. Setting Up Alluxio With Spark And S3
    2. Running Spark on Alluxio with S3
    3. Using Alluxio With Unified Namespace
  38. Deploying Alluxio On A Cluster
    1. Deploying Alluxio In AWS
  39. Conclusion
    1. Contributing To The Project And Conclusion

Product information

  • Title: A Beginner's Guide to Architecting Big Data Applications
  • Author(s): O'Reilly Media, Inc.
  • Release date: December 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491978610