Hadoop Fundamentals for Data Scientists
Hadoop's Architecture, Distributed Computing Framework, and Analytical Ecosystem
Publisher: O'Reilly Media
Final Release Date: January 2015
Run time: 6 hours 5 minutes

Get a practical introduction to Hadoop, the framework that made big data and large-scale analytics possible by combining distributed computing techniques with distributed storage. In this video tutorial, hosts Benjamin Bengfort and Jenny Kim discuss the core concepts behind distributed computing and big data, and then show you how to work with a Hadoop cluster and program analytical jobs. You’ll also learn how to use higher-level tools such as Hive and Spark.

Hadoop is a cluster computing technology that has many moving parts, including distributed systems administration, data engineering and warehousing methodologies, software engineering for distributed computing, and large-scale analytics. With this video, you’ll learn how to operationalize analytics over large datasets and rapidly deploy analytical jobs with a variety of toolsets.

Once you’ve completed this video, you’ll understand how different parts of Hadoop combine to form an entire data pipeline managed by teams of data engineers, data programmers, data researchers, and data business people.

  • Understand the Hadoop architecture and set up a pseudo-distributed development environment
  • Learn how to develop distributed computations with MapReduce and the Hadoop Distributed File System (HDFS)
  • Work with Hadoop via the command-line interface
  • Use the Hadoop Streaming utility to execute MapReduce jobs in Python
  • Explore data warehousing, higher-order data flows, and other projects in the Hadoop ecosystem
  • Learn how to use Hive to query and analyze relational data using Hadoop
  • Use summarization, filtering, and aggregation to move Big Data towards last mile computation
  • Understand how analytical workflows including iterative machine learning, feature analysis, and data modeling work in a Big Data context

Benjamin Bengfort is a data scientist and programmer in Washington DC who prefers technology to politics but sees the value of data in every domain. Alongside his work teaching, writing, and developing large-scale analytics with a focus on statistical machine learning, he is finishing his PhD at the University of Maryland where he studies machine learning and artificial intelligence.

Jenny Kim, a software engineer in the San Francisco Bay Area, develops, teaches, and writes about big data analytics applications and specializes in large-scale, distributed computing infrastructures and machine-learning algorithms to support recommendations systems.

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyHadoop Fundamentals for Data Scientists
 
4.8

(based on 5 reviews)

Ratings Distribution

  • 5 Stars

     

    (4)

  • 4 Stars

     

    (1)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Accurate (4)
  • Concise (4)
  • Easy to understand (4)
  • Helpful examples (4)
  • Well-written (3)

Cons

No Cons

Best Uses

  • Novice (3)

Reviewed by 5 customers

Displaying reviews 1-5

Back to top

 
5.0

Pretty awesome!

By Ken

from Irvine, CA

About Me Developer

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Novice

    Comments about oreilly Hadoop Fundamentals for Data Scientists:

    If you are interested in learning Hadoop, it is probably good choice to learn. Additionally, if you even do not know anything about it, just try to take that course to better understand about data. Getting new skills make you a valuable person. I also used to learn that course (http://www.thedevmasters.com) I also highly recommend it.

     
    5.0

    Excellent Coverage of Hadoop

    By Tony

    from DC

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice
      • Student

      Comments about oreilly Hadoop Fundamentals for Data Scientists:

      Very intuitive explanations of Hadoop ecosystem. Highly recommended.

       
      4.0

      Pretty good

      By KS

      from SF, CA

      About Me Designer, Developer

      Verified Buyer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

      • Not comprehensive enough

      Best Uses

      • Novice

      Comments about oreilly Hadoop Fundamentals for Data Scientists:

      This is a pretty good intro course. I would definitely recommend it to others.

      I'd say this is on par with Rich Morrow's course.

      -1 star for being a little out of date.

      (1 of 1 customers found this review helpful)

       
      5.0

      Tour the Hadoop Ecosystem!

      By Snehasish

      from Greater Boston Area, MA

      About Me Data Science Grad Student

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples

      Cons

      • Pig Coverage Missing

      Best Uses

        Comments about oreilly Hadoop Fundamentals for Data Scientists:

        These videos have enough information that any data scientist ever needs to know to get up and running with the Hadoop ecosystem. The authors provide a fully configured VM that runs Hadoop, Hive, and Spark, which is very helpful.

        The authors cover the concepts about Hadoop architecture, Map-reduce, and HDFS in detail. They also show step-by-step examples on how to run MR jobs, Hive, Spark, and Hadoop Streaming, what configuration one needs to do to get them running!

        Pig was not covered in depth. The authors could have dedicated one or two sections on Pig.

        I found these videos extremely helpful during my Hadoop projects. I ended up configuring hadoop, hive, and pig without a VM on my Mac (makes my machine slow), but the content in these videos helped a lot because the official sites of the Hadoop projects don't have all the information to run Hadoop and its components natively on Mac.

        Overall, I highly recommend these videos to learn about Hadoop.

        (1 of 2 customers found this review helpful)

         
        5.0

        Great video introduction to Hadoop!

        By Cyprus

        from Seattle, WA

        Comments about oreilly Hadoop Fundamentals for Data Scientists:

        Great introduction to Hadoop. Covers everything you need to know to get started with Hadoop for all those nasty data sets you want to analyze. Presenters are knoweldgeable and easy to understand.

        Displaying reviews 1-5

        Back to top

         
        Buy 2 Get 1 Free Free Shipping Guarantee
        Buying Options
        Immediate Access - Go Digital what's this?
        Video:  $119.99
        (Streaming, Downloadable)