Data Algorithms
Recipes for Scaling Up with Hadoop and Spark
Publisher: O'Reilly Media
Final Release Date: July 2015
Pages: 778

If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects.

Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark.

Topics include:

  • Market basket analysis for a large set of transactions
  • Data mining algorithms (K-means, KNN, and Naive Bayes)
  • Using huge genomic data to sequence DNA and RNA
  • Naive Bayes theorem and Markov chains for data and market prediction
  • Recommendation algorithms and pairwise document similarity
  • Linear regression, Cox regression, and Pearson correlation
  • Allelic frequency and mining DNA
  • Social network analysis (recommendation systems, counting triangles, sentiment analysis)
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Algorithms
 
4.8

(based on 10 reviews)

Ratings Distribution

  • 5 Stars

     

    (8)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (9)
  • Well-written (9)
  • Accurate (8)
  • Concise (8)
  • Easy to understand (7)

Cons

No Cons

Best Uses

  • Intermediate (9)
  • Expert (8)
  • Student (4)
    • Reviewer Profile:
    • Developer (9), Designer (7), Educator (4)

Reviewed by 10 customers

Sort by

Displaying reviews 1-10

Back to top

(0 of 1 customers found this review helpful)

 
5.0

MapReduce Algorithms in Action!

By Dariush the MapReducer!

from Palo Alto, CA

About Me Designer, Developer, Educator, Maker, Sys Admin

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Expert
    • Intermediate

    Comments about oreilly Data Algorithms:

    * simple Spark and MapReduce solutions to data design patterns
    * all examples are complete and running!
    * great selection of algorithms for engineers and bioinformaticians
    * Great Book on MapReduce algorithms!

    (0 of 1 customers found this review helpful)

     
    5.0

    Best Practical MapReduce Algorithms

    By Jane Sharif

    from Palo Alto, CA

    About Me Designer, Developer, Maker, Sys Admin

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate
      • Novice
      • Student

      Comments about oreilly Data Algorithms:

      I have read so many books on MapReduce algorithms, this book is different: it brings simplicity and practicality to the real world! The author has explained hard concepts by simple MapReduce algorithms and examples and showing step-by-step amp() and reduce() applications. I have already applied some of these algorithms for my projects. Great book!

      (0 of 1 customers found this review helpful)

       
      5.0

      Learn MapReduce with real examples

      By Martin Kman

      from Santa Barbara, CA

      About Me Designer, Developer, Educator

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Data Algorithms:

        I used several MapReduce algorithms for my real projects at work. I was able to copy the skeleton of programs from GitHub, then tweak them and use it.
        The author has presented important algorithms with simple MapReduce and Spark programs, which really works.

         
        4.0

        Excellen resource for MapReduce recipies

        By Nitin

        from Mumbai

        About Me Developer

        Pros

        • Helpful examples
        • Well-written

        Cons

        • Not comprehensive enough

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Data Algorithms:

        This is a very extensive book about using Hadoop and Spark to implement various algorithms. A variety of algorithms, from simple secondary sort to RNA sequencing is covered in this mammoth book.
        The author has provided a complete set of several algorithms, and their implementation in both Hadoop and Spark. Algorithms include implementation of several common data algorithms such as Top-N list, K-nearest neighborers, recommendation system, sentiment analysis and Markov Model in MapReduce. Several statistical problems are also included, such as Pearson Correlation, Cox Regression, T test and so on. The author also covers DNA/RNA sequencing, Allelic frequency and Gene aggregation.
        Overall, this book is an excellent resource, one of the very few books which properly explain how to use MapReduce to solve problems. Each algorithm is properly defined, defines the map/reduce strategy and has explanations of both Hadoop and Spark code. The author assumes the reader has basic proficiency in Hadoop, so this may not be useful for a complete beginner. The source code for all algorithms is also available in the appendix.
        On the downside, explanations could be a bit more detailed. Also, as of now, the book still had hand drawn pictures, but I expect they will be removed in the final release.

        (0 of 1 customers found this review helpful)

         
        5.0

        I would buy this product again

        By jfm

        from Spain

        Verified Buyer

        Comments about oreilly Data Algorithms:

        Perfect product to begin with really important tasks

        (0 of 1 customers found this review helpful)

         
        5.0

        Data Algorithms at Work!

        By David the programmer

        from San Jose, CA

        About Me Designer, Developer, Educator

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate

          Comments about oreilly Data Algorithms:

          This book brings data science to reality. Details are provided on MapReduce and distributed algorithms. It has interesting chapter on monoids and MR and optimizations. Working examples are extremely helpful. Well done!

          (2 of 3 customers found this review helpful)

           
          5.0

          Covering a wide variety of MapReduce

          By Susan Z

          from Cupertino, CA

          About Me Designer, Developer

          Verified Reviewer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Expert
            • Intermediate

            Comments about oreilly Data Algorithms:

            Enjoyed reading this book (can use for my work!): covers a wide variety of MapReduce and Spark programs. This is the first MR book which covers DNA-Seq and other statistical algorithms. Well done!

            (1 of 2 customers found this review helpful)

             
            5.0

            Great MapReduce Book

            By Sprintmoun100

            from Fargo, ND

            About Me Designer, Developer

            Verified Reviewer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Expert
              • Intermediate
              • Student

              Comments about oreilly Data Algorithms:

              Great MapReduce book on variety of topics. Detailed examples on using Spark and Hadoop for MapReduce algorithms. The best part is that all solutions has source code on GitHub: https://github.com/mahmoudparsian/data-algorithms-book

              (2 of 3 customers found this review helpful)

               
              5.0

              MapReduce is nicely explained!

              By Mike Hanif

              from Falls Church, VA

              About Me Designer, Developer, Educator

              Pros

              • Accurate
              • Concise
              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Expert
                • Intermediate
                • Student

                Comments about oreilly Data Algorithms:

                The author has given solid and working examples using MapReduce, Hadoop, and Spark. The range of algorithms spans from basics to sophisticated (such as Markov chains, DNA-Sequencing, Naive Bayes, kNN, ...). I have already applied some of the MapReduce algorithms for my work. Spark examples show step-by-step how to apply data algorithms to solve real problems.
                Some of the shell scripts needs to be polished (I am sure it will, since it is an early release!).

                (6 of 6 customers found this review helpful)

                 
                4.0

                Great Book, BUT....

                By Don E

                from Phoenix AZ

                About Me Developer

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Helpful examples
                • Well-written

                Cons

                  Best Uses

                  • Intermediate
                  • Novice
                  • Student

                  Comments about oreilly Data Algorithms:

                  I am happy to be putting this out before the book is out...

                  I have read the first six chapters and i really like it. But one thing i have a problem with is the idea of using the Old API instead of the new one.

                  For instance, using the JobConf(which i thought was depreciated) class instead of Job Class on the new API.

                  I tried to get this across to Mahmoud Parsian, but i was unable to find an email. So could someone please get the message across

                  Displaying reviews 1-10

                  Back to top

                   
                  Buy 2 Get 1 Free Free Shipping Guarantee
                  Buying Options
                  Immediate Access - Go Digital what's this?
                  Ebook: $59.99
                  Formats:  ePub, Mobi, PDF
                  Print & Ebook: $76.99
                  Print: $69.99