How to Develop Big Data Applications for Hadoop
Publisher: O'Reilly Media
Final Release Date: February 2011
Run time: 2 hours 25 minutes

Distributed applications running on Hadoop clusters can deliver powerful insights and results from the biggest data sets ever generated. But do you have to be a rocket scientist to use it? Fortunately, the answer is no. This tutorial will explain the theory of MapReduce and how to develop big data applications in Java and higher level languages such as Pig and Hive SQL. Using practical, real-world examples such as weblog processing, analytics, and text summarization, it will cover how to prototype, debug, monitor, test and optimize big data applications for Hadoop’s distributed processing platform. Attendees will get hands-on instruction and will leave with a solid understanding of how to analyze data on Hadoop clusters and practical examples they can use and build on after the tutorial.

Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyHow to Develop Big Data Applications for Hadoop
 
4.0

(based on 2 reviews)

Ratings Distribution

  • 5 Stars

     

    (0)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

Reviewed by 2 customers

Displaying reviews 1-2

Back to top

 
4.0

This video is a great introduction

By wiebedj

from Vancouver, BC

About Me Developer

Verified Reviewer

Pros

  • Easy to understand
  • Helpful examples

Cons

    Best Uses

    • Intermediate

    Comments about oreilly How to Develop Big Data Applications for Hadoop:

    This video is a morning session of the Strata 2011 O'Reilly conference, and while the editing of the video could have been better, the content is excellent.

    It starts off with the history of Hadoop, the basics of map-reduce infrastructure, and the languages, libraries, and other supporting projects that go with it.

    There are overviews of Amazon Web Services (AWS) and Concurrent's Cascading product

    One of the central ideas of the video is that MapReduce (MR) is too low level to express anything more than a simple algorithm. Tools, such as Karmasphere Studio, can help generate the needed boilerplate code when given a higher level model. Tools that work with these higher level models include

    Cascading, a visual flow layout tool for combining multiple MR steps
    Hive, a SQL-like language that can work with most any file types/flat files
    Pig, a language for data analysis
    A case study follows on how Playfish, a company which makes games which run on Facebook, uses Karmasphere Analyst to produce their reports. Every click on a Playfish game is considered a tuple to be processed, and it used to take a long time to run a report. Now, with Analyst and AWS, the reporting has sped up tremendously, enabling Playfish to respond to trends that much quicker.

    Next, a hands-on lab, led by Abe Taha of Karmasphere, was the highlight of the video. It covered:

    installation of Karmasphere Studio into Eclipse
    working with the Hadoop perspective to setup clusters and such
    using the Java perspective to create various artifacts, like reducers, mappers, and partitioners
    defining and loading datafiles with Karmasphere Analyst
    using hive to implement joins, which are easy in hive but would be difficult in Java MR
    This was all then finished off with a Q&A session.

    Overall, a great video well worth the time.

    (1 of 1 customers found this review helpful)

     
    4.0

    This video is a great introduction

    By wiebedj

    from Vancouver, BC

    About Me Developer

    Verified Reviewer

    Pros

    • Easy to understand
    • Helpful examples

    Cons

    • Needs better editing

    Best Uses

    • Intermediate

    Comments about oreilly How to Develop Big Data Applications for Hadoop:

    A morning session of the Strata 2011 O'Reilly conference, it is a video of a panel of speakers from Karmasphere, Amazon Web Services, and Concurrent. The video comes in three parts totaling 145 minutes, and while the editing of the video could have been better, the content is excellent.

    It starts off with the history of Hadoop, the basics of map-reduce infrastructure, and the languages, libraries, and other supporting projects that go with it.

    Ken Krugler of Amazon gives an overview of Amazon Web Services (AWS), followed by Chris Wensel of Concurrent talking about their Cascading product

    One of the central ideas of the video is that MapReduce (MR) is too low level to express anything more than a simple algorithm. Tools, such as Karmasphere Studio, can help generate the needed boilerplate code when given a higher level model. Tools that work with these higher level models include

    Cascading, a visual flow layout tool for combining multiple MR steps
    Hive, a SQL-like language that can work with most any file types/flat files
    Pig, a language for data analysis
    A case study follows on how Playfish, a company which makes games which run on Facebook, uses Karmasphere Analyst to produce their reports. Every click on a Playfish game is considered a tuple to be processed, and it used to take a long time to run a report. Now, with Analyst and AWS, the reporting has sped up tremendously, enabling Playfish to respond to trends that much quicker.

    Next, a hands-on lab, led by Abe Taha of Karmasphere, was the highlight of the video. It covered:

    installation of Karmasphere Studio into Eclipse
    working with the Hadoop perspective to setup clusters and such
    using the Java perspective to create various artifacts, like reducers, mappers, and partitioners
    defining and loading datafiles with Karmasphere Analyst
    using hive to implement joins, which are easy in hive but would be difficult in Java MR
    This was all then finished off with a Q&A session.

    Overall, a great video well worth the time.

    Displaying reviews 1-2

    Back to top

     
    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Immediate Access - Go Digital what's this?
    Video:  $29.99
    (Streaming, Downloadable)
    This item is not available.