An introduction to storing, structuring, and analyzing data at scale with Hadoop
Learning Hadoop 2 introduces you to the powerful system synonymous with Big Data, demonstrating how to create an instance and leverage Hadoop ecosystem's many components to store, process, manage, and query massive data sets with confidence.
We open this course by providing an overview of the Hadoop component ecosystem, including HDFS, Sqoop, Flume, YARN, MapReduce, Pig, and Hive, before installing and configuring our Hadoop environment. We take a look at Hue, the graphical user interface of Hadoop.
We will then discover HDFS, Hadoop’s file-system used to store data. We will learn how to import and export data, both manually and automatically. Afterward, we turn our attention toward running computations using MapReduce, and get to grips working with Hadoop’s scripting language, Pig. Lastly, we will siphon data from HDFS into Hive, and demonstrate how it can be used to structure and query data sets.
Who this course is for
This video course is designed for application and system developers interested in understanding how to manage and analyze large scale data sets with the Hadoop framework. We expect familiarity working at the Linux command line, and a basic understanding of Java. No prior experience with Hadoop is required.
What you will learn from this course
- Install and configure an Hadoop instance of your own
- Navigate Hue, the GUI for common tasks in Hadoop
- Import data manually, and automatically from a database
- Build scripts with Pig to perform common ETL tasks
- Write and run a simple MapReduce program
- Structure and query data effectively with Hive, Hadoop’s built-in data warehousing component