Data Algorithms
Recipes for Scaling Up with Hadoop and Spark
Publisher: O'Reilly Media
Final Release Date: August 2014
Pages: 778

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.

Learn the algorithms and tools you need to build MapReduce applications with Hadoop for processing gigabyte, terabyte, or petabyte-sized datasets on clusters of commodity hardware. With this practical book, Author Mahmoud Parsian, head of the big data team at Illumina, takes you step-by-step through the design of machine-learning algorithms, such as Naive Bayes and Markov Chain, and shows you how apply them to clinical and biological datasets, using MapReduce design patterns.

  • Apply MapReduce algorithms to clinical and biological data, such as DNA-Seq and RNA-Seq
  • Use the most relevant regression/analytical algorithms used for different biological data types
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Algorithms
 
4.8

(based on 10 reviews)

Ratings Distribution

  • 5 Stars

     

    (8)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (9)
  • Well-written (9)
  • Accurate (8)
  • Concise (8)
  • Easy to understand (7)

Cons

No Cons

Best Uses

  • Intermediate (9)
  • Expert (8)
  • Student (4)
    • Reviewer Profile:
    • Developer (9), Designer (7), Educator (4)

Reviewed by 10 customers

Sort by

Displaying reviews 1-10

Back to top

(0 of 1 customers found this review helpful)

 
5.0

MapReduce Algorithms in Action!

By Dariush the MapReducer!

from Palo Alto, CA

About Me Designer, Developer, Educator, Maker, Sys Admin

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Expert
    • Intermediate

    Comments about oreilly Data Algorithms:

    * simple Spark and MapReduce solutions to data design patterns
    * all examples are complete and running!
    * great selection of algorithms for engineers and bioinformaticians
    * Great Book on MapReduce algorithms!

    (0 of 1 customers found this review helpful)

     
    5.0

    Best Practical MapReduce Algorithms

    By Jane Sharif

    from Palo Alto, CA

    About Me Designer, Developer, Maker, Sys Admin

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate
      • Novice
      • Student

      Comments about oreilly Data Algorithms:

      I have read so many books on MapReduce algorithms, this book is different: it brings simplicity and practicality to the real world! The author has explained hard concepts by simple MapReduce algorithms and examples and showing step-by-step amp() and reduce() applications. I have already applied some of these algorithms for my projects. Great book!

      (0 of 1 customers found this review helpful)

       
      5.0

      Learn MapReduce with real examples

      By Martin Kman

      from Santa Barbara, CA

      About Me Designer, Developer, Educator

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Data Algorithms:

        I used several MapReduce algorithms for my real projects at work. I was able to copy the skeleton of programs from GitHub, then tweak them and use it.
        The author has presented important algorithms with simple MapReduce and Spark programs, which really works.

         
        4.0

        Excellen resource for MapReduce recipies

        By Nitin

        from Mumbai

        About Me Developer

        Pros

        • Helpful examples
        • Well-written

        Cons

        • Not comprehensive enough

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Data Algorithms:

        This is a very extensive book about using Hadoop and Spark to implement various algorithms. A variety of algorithms, from simple secondary sort to RNA sequencing is covered in this mammoth book.
        The author has provided a complete set of several algorithms, and their implementation in both Hadoop and Spark. Algorithms include implementation of several common data algorithms such as Top-N list, K-nearest neighborers, recommendation system, sentiment analysis and Markov Model in MapReduce. Several statistical problems are also included, such as Pearson Correlation, Cox Regression, T test and so on. The author also covers DNA/RNA sequencing, Allelic frequency and Gene aggregation.
        Overall, this book is an excellent resource, one of the very few books which properly explain how to use MapReduce to solve problems. Each algorithm is properly defined, defines the map/reduce strategy and has explanations of both Hadoop and Spark code. The author assumes the reader has basic proficiency in Hadoop, so this may not be useful for a complete beginner. The source code for all algorithms is also available in the appendix.
        On the downside, explanations could be a bit more detailed. Also, as of now, the book still had hand drawn pictures, but I expect they will be removed in the final release.

        (0 of 1 customers found this review helpful)

         
        5.0

        I would buy this product again

        By jfm

        from Spain

        Verified Buyer

        Comments about oreilly Data Algorithms:

        Perfect product to begin with really important tasks

        (0 of 1 customers found this review helpful)

         
        5.0

        Data Algorithms at Work!

        By David the programmer

        from San Jose, CA

        About Me Designer, Developer, Educator

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate

          Comments about oreilly Data Algorithms:

          This book brings data science to reality. Details are provided on MapReduce and distributed algorithms. It has interesting chapter on monoids and MR and optimizations. Working examples are extremely helpful. Well done!

          (1 of 2 customers found this review helpful)

           
          5.0

          Covering a wide variety of MapReduce

          By Susan Z

          from Cupertino, CA

          About Me Designer, Developer

          Verified Reviewer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Expert
            • Intermediate

            Comments about oreilly Data Algorithms:

            Enjoyed reading this book (can use for my work!): covers a wide variety of MapReduce and Spark programs. This is the first MR book which covers DNA-Seq and other statistical algorithms. Well done!

            (1 of 2 customers found this review helpful)

             
            5.0

            Great MapReduce Book

            By Sprintmoun100

            from Fargo, ND

            About Me Designer, Developer

            Verified Reviewer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Expert
              • Intermediate
              • Student

              Comments about oreilly Data Algorithms:

              Great MapReduce book on variety of topics. Detailed examples on using Spark and Hadoop for MapReduce algorithms. The best part is that all solutions has source code on GitHub: https://github.com/mahmoudparsian/data-algorithms-book

              (1 of 2 customers found this review helpful)

               
              5.0

              MapReduce is nicely explained!

              By Mike Hanif

              from Falls Church, VA

              About Me Designer, Developer, Educator

              Pros

              • Accurate
              • Concise
              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Expert
                • Intermediate
                • Student

                Comments about oreilly Data Algorithms:

                The author has given solid and working examples using MapReduce, Hadoop, and Spark. The range of algorithms spans from basics to sophisticated (such as Markov chains, DNA-Sequencing, Naive Bayes, kNN, ...). I have already applied some of the MapReduce algorithms for my work. Spark examples show step-by-step how to apply data algorithms to solve real problems.
                Some of the shell scripts needs to be polished (I am sure it will, since it is an early release!).

                (5 of 5 customers found this review helpful)

                 
                4.0

                Great Book, BUT....

                By Don E

                from Phoenix AZ

                About Me Developer

                Pros

                • Accurate
                • Concise
                • Easy to understand
                • Helpful examples
                • Well-written

                Cons

                  Best Uses

                  • Intermediate
                  • Novice
                  • Student

                  Comments about oreilly Data Algorithms:

                  I am happy to be putting this out before the book is out...

                  I have read the first six chapters and i really like it. But one thing i have a problem with is the idea of using the Old API instead of the new one.

                  For instance, using the JobConf(which i thought was depreciated) class instead of Job Class on the new API.

                  I tried to get this across to Mahmoud Parsian, but i was unable to find an email. So could someone please get the message across

                  Displaying reviews 1-10

                  Back to top

                   
                  Buy 2 Get 1 Free Free Shipping Guarantee
                  Buying Options
                  Immediate Access - Go Digital what's this?
                  Pre-Order  Print: $69.99
                  July 2015 (est.)