Video description
Apache Kudu, the breakthrough storage technology, is often used in conjunction with other Hadoop ecosystem frameworks for data ingest, processing, and analysis. This is a practical, hands-on course that shows you how Kudu works with four of those frameworks: Apache Spark, Spark SQL, MLlib, and Apache Flume.
You'll use the Kudu-Spark module with Spark and SparkSQL to seamlessly create, move, and update data between Kudu and Spark; then use Apache Flume to stream events into a Kudu table, and finally, query it using Apache Impala. The course is designed for learners with some limited experience using Hadoop ecosystem components like HDFS, Hive, Spark, or Impala.
- Get hands-on experience with Kudu and add more tools to your Big Data toolbox
- Learn how to move data between Kudu tables and Spark apps using the Kudu-Spark module
- Understand how to stream and analyze data in real-time with Flume and Kudu
- Create a movie ratings predictor using Flume and save the predicted values into Kudu
- See how these open source tools combine to create simple and fast data engineering pipelines
Table of contents
- Welcome To The Course 00:00:23
- About The Author 00:01:27
- Integrating Kudu With Apache Flume 00:11:49
- Using Kudu With Apache Spark Part 1 00:07:45
- Using Kudu With Apache Spark Part 2 00:06:07
Product information
- Title: Using Kudu with Apache Spark and Apache Flume
- Author(s):
- Release date: March 2017
- Publisher(s): Infinite Skills
- ISBN: 9781491985717
You might also like
book
Getting Started with Kudu
Fast data ingestion, serving, and analytics in the Hadoop ecosystem have forced developers and architects to …
book
Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark
Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments …
video
Building a Near Real-Time Analytical Application with Kudu
Building near real-time analytical applications that combine real-time data inserts, updates, and fast analytics is almost …
video
Basic Kudu Installation, API Usage, and SQL Integration
Apache Kudu is a required skill in the Big Data world because it addresses problems that …