Hadoop Application Architectures
Designing Real-World Big Data Applications
Publisher: O'Reilly Media
Final Release Date: June 2015
Pages: 400

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case.

To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process.

This book covers:

  • Factors to consider when using Hadoop to store and model data
  • Best practices for moving data in and out of the system
  • Data processing frameworks, including MapReduce, Spark, and Hive
  • Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics
  • Giraph, GraphX, and other tools for large graph processing on Hadoop
  • Using workflow orchestration and scheduling tools such as Apache Oozie
  • Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume
  • Architecture examples for clickstream analysis, fraud detection, and data warehousing
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyHadoop Application Architectures
 
4.4

(based on 11 reviews)

Ratings Distribution

  • 5 Stars

     

    (7)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (1)

  • 2 Stars

     

    (1)

  • 1 Stars

     

    (0)

91%

of respondents would recommend this to a friend.

Pros

  • Accurate (9)
  • Easy to understand (8)
  • Well-written (7)
  • Concise (6)
  • Helpful examples (6)

Cons

  • Not comprehensive enough (4)

Best Uses

  • Intermediate (10)
  • Expert (5)
  • Novice (3)
    • Reviewer Profile:
    • Developer (7), Designer (4)

Reviewed by 11 customers

Displaying reviews 1-10

Back to top

Previous | Next »

(2 of 2 customers found this review helpful)

 
4.0

Good architectural guide for big data engineers

By Emre Sevinc

from Belgium

About Me Designer, Developer

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Well-written

Cons

  • Not comprehensive enough

Best Uses

  • Expert
  • Intermediate

Comments about oreilly Hadoop Application Architectures:

This is a book for software / data engineers who've been using Hadoop and related technologies for a while in practical projects, as well as for software architects looking for high level overview of how many of Big Data technology stack components relate to each other, and justifications to use which of them in different use cases.

The book is very well and clearly organized, and proceeds very logically in terms of Hadoop storage options, how to put / ingest data into a Hadoop environment, how to decide and use processing engines for Hadoop such as MapReduce, Spark, Hive, etc., how to utilize those engines to do important and critical tasks such as record deduplication, windowing analysis, and time series modification. The exposition of these fundamental building blocks are followed by graph processing on Hadoop, where both Giraph and Spark GraphX are described and contrasted. And then the topic of orchestration of Hadoop workflows are described to an extent, mainly showing how to configure and use Oozie. Part I finishes by describing Near-Realtime processing in Hadoop, and shows how Storm, Trident and Spark Streaming can be used for satisfying different requirements.

The second part of the book is dedicated to real-world use cases such as Clickstream Analytics, Fraud Detection, and Data Warehousing. The authors provide a good and broad overview for each case, clearly showing where and how Hadoop software stack helps, together with architectural recommendations, but I think the the final use case, Data Warehouse chapter is the most interesting one because it makes use of a very popular, publicly available movie data set known as MovieLens. Thanks to this, it is very easy to follow this chapter by using the same data and apply the designs and programming steps, creating your own customizations and investigating different scenarios and technical challenges you can come up with.

As a conclusion, I can recommend this book to big data architects and software engineers who are not total novices when it comes to Hadoop. The book is of course a bit date, in the very fast moving world of big data, 2015 sounds already distant past, but thanks to the extensive industrial and practical experience of authors, the way they explain their thinking and justifications for very different scenarios shed light on current and upcoming challenges for many big data engineers.

(1 of 2 customers found this review helpful)

 
4.0

Useful reference

By Jeff

from NY, NY

Verified Buyer

Pros

  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

  • Not comprehensive enough

Best Uses

  • Intermediate
  • Novice

Comments about oreilly Hadoop Application Architectures:

Useful to proof some architectural choices in this technology stack.
Goes in some details about HDFS file formats, compression, container formats, etc.
Provides overview of most known/often mentioned Hadoop tools.

Overall, level of detail is good up to an intermediate user.
I do see that more experienced in this space people might find content somewhat high level.

It though serves my purpose of getting base-line architecture concepts in this technology.

 
5.0

Essential reading for any Data Engineer/Architect

By Gideon

from Israel

About Me Developer

Verified Buyer

Pros

  • Accurate
  • Concise
  • Helpful examples

Cons

    Best Uses

    • Intermediate

    Comments about oreilly Hadoop Application Architectures:

    An excellent and detailed overview of Big Data architecture patterns and best practices.
    The book covers most aspects of data ingestion and processing, including case studies which teach you how to design practical data pipelines.
    Recommended for any big data engineer/architect.

    (1 of 3 customers found this review helpful)

     
    2.0

    the book title doesn't match to the expectations

    By anand

    from atlanta,ga

    Verified Buyer

    Pros

      Cons

      • Difficult to understand
      • Not comprehensive enough
      • Too basic

      Best Uses

        Comments about oreilly Hadoop Application Architectures:

        In general we would expect lot on design and development methodologies of an application, but this book doesn't much help on that, This book speaks lot on Spark ..but not the real use case

        (1 of 1 customers found this review helpful)

         
        5.0

        Execellent book. Highly recommended!

        By pnwhitney

        from Allen, TX

        About Me Developer

        Verified Buyer

        Pros

        • Accurate
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate
          • Novice
          • Student

          Comments about oreilly Hadoop Application Architectures:

          Great overview of everything Hadoop and then some!

           
          5.0

          Well put together Hadoop architecture book

          By Go COLTS!

          from IN

          About Me Designer

          Verified Buyer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate

            Comments about oreilly Hadoop Application Architectures:

            The book provides just enough depth to get a general understanding of the ingest methods and processing frameworks on Hadoop for an architect without going into too much syntax.

             
            5.0

            Essential concern-centric, not tool-centric, view of Hadoop

            By Sean

            from London, UK

            About Me Developer

            Verified Reviewer

            Pros

            • Accurate
            • Helpful examples

            Cons

              Best Uses

              • Expert
              • Intermediate

              Comments about oreilly Hadoop Application Architectures:

              Disclosure: the authors are my coworkers, so I know what they've been up to and have awaited the release of this book.

              I believe it provides a much needed guide for developers and architects working in the Hadoop ecosystem since it focuses on cross-cutting concerns, not just tools. The first half is organized around essential elements of any application architecture: formats, schemas, ingest, processing, workflow orchestration. It connects these to tools in the ecosystem and gives a survey of their use, but focuses on the issues and the general solutions, rather than just the tools that are deployed. This is refreshing.

              The second half provides end-to-end examples of architecting common use cases, like clickstream processing and fraud detection, as worked examples. A great guide for those who want to understand the "why" and "how" of Hadoop app development and not just the "what".

              (0 of 1 customers found this review helpful)

               
              5.0

              Good reference book

              By Shridhar R

              from Bangalore India

              About Me Developer

              Verified Buyer

              Pros

              • Accurate
              • Concise
              • Easy to understand

              Cons

                Best Uses

                • Expert
                • Intermediate

                Comments about oreilly Hadoop Application Architectures:

                One of the best book for hadoop architects and developers. It is concise and explains architecture patterns in simple way. This is good reference book for making architectural or design decisions.

                (1 of 3 customers found this review helpful)

                 
                3.0

                Very nice introduction to Hadoop!

                By Svende

                from Copenhagen

                About Me Designer, Developer

                Verified Buyer

                Pros

                • Accurate
                • Easy to understand
                • Well-written

                Cons

                • Miss An Area Description
                • Miss Load Examples
                • Not comprehensive enough

                Best Uses

                • Expert
                • Intermediate
                • Novice
                • Student

                Comments about oreilly Hadoop Application Architectures:

                Have read 4 chapters. We have had great value reading these chapters, but we still have problems understanding the load processes.
                I miss information about the ingestion/load proces. Some examples telling what can/shall happen until the data is in place.

                (0 of 2 customers found this review helpful)

                 
                5.0

                I would buy again

                By Steve

                from NY

                About Me Designer

                Verified Buyer

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Helpful examples
                • Well-written

                Cons

                  Best Uses

                  • Expert
                  • Intermediate

                  Comments about oreilly Hadoop Application Architectures:

                  Contains a plethora of good information necessary to help architect a Hadoop environment. Need to blend with ongoing app updates and techniques.

                  Displaying reviews 1-10

                  Back to top

                  Previous | Next »

                   
                  Buy 2 Get 1 Free Free Shipping Guarantee
                  Buying Options
                  Immediate Access - Go Digital what's this?
                  Ebook:  $42.99
                  Formats:  DAISY, ePub, Mobi, PDF
                  Print & Ebook:  $54.99
                  Print:  $49.99