Data Algorithms
Recipes for Scaling Up with Hadoop and Spark
Publisher: O'Reilly Media
Final Release Date: July 2015
Pages: 778

If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects.

Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark.

Topics include:

  • Market basket analysis for a large set of transactions
  • Data mining algorithms (K-means, KNN, and Naive Bayes)
  • Using huge genomic data to sequence DNA and RNA
  • Naive Bayes theorem and Markov chains for data and market prediction
  • Recommendation algorithms and pairwise document similarity
  • Linear regression, Cox regression, and Pearson correlation
  • Allelic frequency and mining DNA
  • Social network analysis (recommendation systems, counting triangles, sentiment analysis)
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyData Algorithms
 
4.8

(based on 24 reviews)

Ratings Distribution

  • 5 Stars

     

    (21)

  • 4 Stars

     

    (2)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (1)

96%

of respondents would recommend this to a friend.

Pros

  • Well-written (22)
  • Helpful examples (21)
  • Accurate (19)
  • Easy to understand (19)
  • Concise (17)

Cons

No Cons

Best Uses

  • Intermediate (21)
  • Expert (18)
  • Student (9)
  • Novice (6)
    • Reviewer Profile:
    • Developer (22), Designer (14), Educator (8), Maker (6), Sys admin (5)

Reviewed by 24 customers

Sort by

Displaying reviews 1-10

Back to top

Previous | Next »

 
5.0

MapReduce and Spark for Day-to-Day Work

By Ruben E.

from Cupertino, CA

About Me Designer, Developer, Educator, Maker

Verified Reviewer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Expert
    • Intermediate

    Comments about oreilly Data Algorithms:

    * Was able to cut and paste 2 algorithms and with minor modification was able to run them for my projects
    * Covers variety of subjects for data scientists and bioinformaticians
    * Provides simple foundations for most of the MapReduce algorithms with working examples
    * Should cover and focus on JDK8 in the next edition!!!

     
    5.0

    Must read for big data developer and data scientist

    By Raj Satya

    from San Jose, CA

    About Me Developer, Educator

    Verified Reviewer

    Pros

    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate
      • Student

      Comments about oreilly Data Algorithms:

      Well articulated algorithms in emerging field of big data and data science in general. Cook book style code references in increasingly popular platform like Spark and map reduce with emphasis on bioinformatic.

       
      5.0

      Great Book

      By Sam Rajaee

      from Silicon Valley, CA

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Data Algorithms:

        This is the quintessential book on data algorithms. It is written extremely clearly and with great detail. It is obvious that the author is an expert on this subject and has a very thorough understanding of what he is writing about. I would recommend this book to anyone interested in this subject.

         
        5.0

        MapReduce for Data Scientists and Bioinformaticians

        By Susan E1.

        from San Jose, CA

        About Me Designer, Developer, Educator, Maker, Sys Admin

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

            Comments about oreilly Data Algorithms:

            * Provides practical working regression algorithms
            * Teaches MapReduce by simple examples
            * Explains Ttest in MapReduce very well

             
            5.0

            MapReduce in Practice

            By Jeff, the Batman

            from Cupertino, CA

            About Me Designer, Developer, Maker, Sys Admin

            Verified Reviewer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Expert
              • Intermediate

              Comments about oreilly Data Algorithms:

              This book focuses on practical MapReduce paradigm and examples rather than focusing on the theory. Spark and Hadoop examples can be used/adopted easily. Great effort, resource, and reference.

               
              5.0

              Dont judge his book by its cover, do judge by its weight

              By A Programmer

              from Santa Clara, CA

              About Me Developer

              Pros

              • Accurate
              • Complete Helpful Examples
              • Concise
              • Well-written

              Cons

                Best Uses

                • Expert
                • Intermediate

                Comments about oreilly Data Algorithms:

                Complete examples with context make this a very useful aid.

                 
                5.0

                Excellent book for learning Spark and MR!

                By Neera

                from Palo Alto, CA

                About Me Developer

                Pros

                • Easy to understand
                • Helpful examples
                • Well-written

                Cons

                  Best Uses

                  • Expert
                  • Intermediate
                  • Novice
                  • Student

                  Comments about oreilly Data Algorithms:

                  Excellent resource to learn and try out Spark and MR paradigm. The book provides Spark and MapReduce solutions to a variety of data design patterns and it does a great job explaining all the examples.

                  Very useful book for newcomers to the field as well as people more experienced. You can find most of the design concepts available but I have not seen any other book to discuss how these concepts will be implemented using Spark and MR. Highly recommend it!

                   
                  5.0

                  A must read, specially for Hadoop and Spark beginners.

                  By Abhishek Guruvayya

                  from Santa Clara

                  About Me Developer

                  Verified Reviewer

                  Pros

                  • Accurate
                  • Easy to understand
                  • Helpful examples
                  • Well-written

                  Cons

                    Best Uses

                    • Intermediate
                    • Novice
                    • Student

                    Comments about oreilly Data Algorithms:

                    This book Provides a good introduction to spark, and also very robust examples, and a deeper dive into most of the well known and widely used data algorithms. Author also provides examples as part of the books on Github which is very useful to try and gain hands on experience. I think it works as a good study material for both data scientists and software engineers. The book manages to answer relevant practical questions while getting started with Spark. In detail explanation and comprehensive examples are the biggest advantage of the books. It does this in an clear and explanatory style.

                     
                    5.0

                    An emerging classic regarding Big Data problem solving

                    By kun

                    from San Jose, Ca

                    About Me Developer

                    Verified Reviewer

                    Pros

                    • Accurate
                    • Easy to understand
                    • Helpful examples
                    • Well-written

                    Cons

                      Best Uses

                      • Expert
                      • Intermediate

                      Comments about oreilly Data Algorithms:

                      Data Algorithm is another classic in the literature of Big Data problem solving. It provides elegant solutions for a list of Big Data issues in a variety of fields. All solutions provided in the book are implemented in both Hadoop Map/Reduce and Spark. It is also a very practical book; many algorithms and design patterns are frequently used in the workplace. These algorithms may be easily applied to make processes much more efficient, reducing unnecessary Map/Reduce processes and thus the pressure to our cluster.

                      The book is designed for intermediate level and advanced level readers. It is best read already having a basic grasp of Map/Reduce, as well as a Java background to more easily follow the book's examples.

                       
                      5.0

                      Very helpful book with clear explanations and great examples

                      By Catarina

                      from Mountain View, CA

                      About Me Developer

                      Pros

                      • Accurate
                      • Concise
                      • Easy to understand
                      • Helpful examples
                      • Well-written

                      Cons

                        Best Uses

                        • Intermediate
                        • Student

                        Comments about oreilly Data Algorithms:

                        This book is very useful if you want to learn about different algorithms used in data analysis.
                        The explanations of concepts and algorithms are very clear, with well written examples. The code are written in Java, which is easy to understand. And Spark implementations are provided so you can directly run the examples and understand the algorithms.

                        Displaying reviews 1-10

                        Back to top

                        Previous | Next »

                         
                        Buy 2 Get 1 Free Free Shipping Guarantee
                        Buying Options
                        Immediate Access - Go Digital what's this?
                        Ebook:  $59.99
                        Formats:  ePub, Mobi, PDF
                        Print & Ebook:  $76.99
                        Print:  $69.99