Data Algorithms
Recipes for Scaling Up with Hadoop and Spark
Publisher: O'Reilly Media
Final Release Date: July 2015
Pages: 778

If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects.

Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark.

Topics include:

  • Market basket analysis for a large set of transactions
  • Data mining algorithms (K-means, KNN, and Naive Bayes)
  • Using huge genomic data to sequence DNA and RNA
  • Naive Bayes theorem and Markov chains for data and market prediction
  • Recommendation algorithms and pairwise document similarity
  • Linear regression, Cox regression, and Pearson correlation
  • Allelic frequency and mining DNA
  • Social network analysis (recommendation systems, counting triangles, sentiment analysis)
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Algorithms
 
4.7

(based on 25 reviews)

Ratings Distribution

  • 5 Stars

     

    (21)

  • 4 Stars

     

    (3)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (1)

96%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (22)
  • Well-written (22)
  • Easy to understand (20)
  • Accurate (19)
  • Concise (17)

Cons

No Cons

Best Uses

  • Intermediate (22)
  • Expert (19)
  • Student (10)
  • Novice (6)
    • Reviewer Profile:
    • Developer (23), Designer (14), Educator (9), Maker (6), Sys admin (5)

Reviewed by 25 customers

Displaying reviews 1-10

Back to top

Previous | Next »

 
4.0

A good guide for hadoop/Spark beginner

By Usian

from London

About Me Developer, Educator

Pros

  • Easy to understand
  • Helpful examples

Cons

    Best Uses

    • Expert
    • Intermediate
    • Student

    Comments about oreilly Data Algorithms:

    This book contains practical examples of how to use hadoop/spark to solve data problems.

    Hoping author could provide full source code for chapters on bioinformatics(Chapter 18(DNA Sequencing), Chapter 25(RNA Sequencing), and Chapter 26(Gene Aggregation)).

     
    5.0

    MapReduce and Spark for Day-to-Day Work

    By Ruben E.

    from Cupertino, CA

    About Me Designer, Developer, Educator, Maker

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate

      Comments about oreilly Data Algorithms:

      * Was able to cut and paste 2 algorithms and with minor modification was able to run them for my projects
      * Covers variety of subjects for data scientists and bioinformaticians
      * Provides simple foundations for most of the MapReduce algorithms with working examples
      * Should cover and focus on JDK8 in the next edition!!!

       
      5.0

      Must read for big data developer and data scientist

      By Raj Satya

      from San Jose, CA

      About Me Developer, Educator

      Verified Reviewer

      Pros

      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate
        • Student

        Comments about oreilly Data Algorithms:

        Well articulated algorithms in emerging field of big data and data science in general. Cook book style code references in increasingly popular platform like Spark and map reduce with emphasis on bioinformatic.

         
        5.0

        Great Book

        By Sam Rajaee

        from Silicon Valley, CA

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate

          Comments about oreilly Data Algorithms:

          This is the quintessential book on data algorithms. It is written extremely clearly and with great detail. It is obvious that the author is an expert on this subject and has a very thorough understanding of what he is writing about. I would recommend this book to anyone interested in this subject.

           
          5.0

          MapReduce for Data Scientists and Bioinformaticians

          By Susan E1.

          from San Jose, CA

          About Me Designer, Developer, Educator, Maker, Sys Admin

          Verified Reviewer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

              Comments about oreilly Data Algorithms:

              * Provides practical working regression algorithms
              * Teaches MapReduce by simple examples
              * Explains Ttest in MapReduce very well

               
              5.0

              MapReduce in Practice

              By Jeff, the Batman

              from Cupertino, CA

              About Me Designer, Developer, Maker, Sys Admin

              Verified Reviewer

              Pros

              • Accurate
              • Concise
              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Expert
                • Intermediate

                Comments about oreilly Data Algorithms:

                This book focuses on practical MapReduce paradigm and examples rather than focusing on the theory. Spark and Hadoop examples can be used/adopted easily. Great effort, resource, and reference.

                 
                5.0

                Dont judge his book by its cover, do judge by its weight

                By A Programmer

                from Santa Clara, CA

                About Me Developer

                Pros

                • Accurate
                • Complete Helpful Examples
                • Concise
                • Well-written

                Cons

                  Best Uses

                  • Expert
                  • Intermediate

                  Comments about oreilly Data Algorithms:

                  Complete examples with context make this a very useful aid.

                   
                  5.0

                  Excellent book for learning Spark and MR!

                  By Neera

                  from Palo Alto, CA

                  About Me Developer

                  Pros

                  • Easy to understand
                  • Helpful examples
                  • Well-written

                  Cons

                    Best Uses

                    • Expert
                    • Intermediate
                    • Novice
                    • Student

                    Comments about oreilly Data Algorithms:

                    Excellent resource to learn and try out Spark and MR paradigm. The book provides Spark and MapReduce solutions to a variety of data design patterns and it does a great job explaining all the examples.

                    Very useful book for newcomers to the field as well as people more experienced. You can find most of the design concepts available but I have not seen any other book to discuss how these concepts will be implemented using Spark and MR. Highly recommend it!

                     
                    5.0

                    A must read, specially for Hadoop and Spark beginners.

                    By Abhishek Guruvayya

                    from Santa Clara

                    About Me Developer

                    Verified Reviewer

                    Pros

                    • Accurate
                    • Easy to understand
                    • Helpful examples
                    • Well-written

                    Cons

                      Best Uses

                      • Intermediate
                      • Novice
                      • Student

                      Comments about oreilly Data Algorithms:

                      This book Provides a good introduction to spark, and also very robust examples, and a deeper dive into most of the well known and widely used data algorithms. Author also provides examples as part of the books on Github which is very useful to try and gain hands on experience. I think it works as a good study material for both data scientists and software engineers. The book manages to answer relevant practical questions while getting started with Spark. In detail explanation and comprehensive examples are the biggest advantage of the books. It does this in an clear and explanatory style.

                       
                      5.0

                      An emerging classic regarding Big Data problem solving

                      By kun

                      from San Jose, Ca

                      About Me Developer

                      Verified Reviewer

                      Pros

                      • Accurate
                      • Easy to understand
                      • Helpful examples
                      • Well-written

                      Cons

                        Best Uses

                        • Expert
                        • Intermediate

                        Comments about oreilly Data Algorithms:

                        Data Algorithm is another classic in the literature of Big Data problem solving. It provides elegant solutions for a list of Big Data issues in a variety of fields. All solutions provided in the book are implemented in both Hadoop Map/Reduce and Spark. It is also a very practical book; many algorithms and design patterns are frequently used in the workplace. These algorithms may be easily applied to make processes much more efficient, reducing unnecessary Map/Reduce processes and thus the pressure to our cluster.

                        The book is designed for intermediate level and advanced level readers. It is best read already having a basic grasp of Map/Reduce, as well as a Java background to more easily follow the book's examples.

                        Displaying reviews 1-10

                        Back to top

                        Previous | Next »

                         
                        Buy 2 Get 1 Free Free Shipping Guarantee
                        Buying Options
                        Immediate Access - Go Digital what's this?
                        Ebook:  $59.99
                        Formats:  ePub, Mobi, PDF
                        Print & Ebook:  $76.99
                        Print:  $69.99