Advanced Analytics with Spark
Patterns for Learning from Data at Scale
Publisher: O'Reilly Media
Final Release Date: April 2015
Pages: 276

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.

You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

Patterns include:

  • Recommending music and the Audioscrobbler data set
  • Predicting forest cover with decision trees
  • Anomaly detection in network traffic with K-means clustering
  • Understanding Wikipedia with Latent Semantic Analysis
  • Analyzing co-occurrence networks with GraphX
  • Geospatial and temporal data analysis on the New York City Taxi Trips data
  • Estimating financial risk through Monte Carlo simulation
  • Analyzing genomics data and the BDG project
  • Analyzing neuroimaging data with PySpark and Thunder
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyAdvanced Analytics with Spark
 
4.3

(based on 8 reviews)

Ratings Distribution

  • 5 Stars

     

    (4)

  • 4 Stars

     

    (3)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (1)

  • 1 Stars

     

    (0)

88%

of respondents would recommend this to a friend.

Pros

  • Helpful examples (6)
  • Well-written (5)
  • Concise (3)

Cons

No Cons

Best Uses

  • Intermediate (7)
    • Reviewer Profile:
    • Developer (4)

Reviewed by 8 customers

Displaying reviews 1-8

Back to top

 
4.0

Great Book If You Have Some Background in Data Science

By Clayton the Data Scientist

from Northern VA

About Me Data Scientist

Verified Buyer

Pros

  • Helpful examples
  • Real World Data

Cons

    Best Uses

    • Intermediate

    Comments about oreilly Advanced Analytics with Spark:

    I am using the book to get up to speed with Spark. For your reference I have a solid background in CPU / GPU algorithm design, but no functional experience with the key value map reduce programming paradigm.

    Pros:

    The book goes over how to use a variety of different machine learning techniques (alternating least squares, k means clustering, Latent Semantic Analysis, etc.) with a variety of real datasets (network intrusion data, a text dump of Wikipedia, Taxi Cab data from NYC) to teach you how to perform Data Science Tasks on the data with Spark.

    The examples are well done, and what I think really makes the book is that it shows you have to get the same data they used when they wrote the example (some of the data is large i.e. > 10GB so downloading at home may take a bit)

    I have found the book very useful.

    The Cons:

    Skips over how the algorithms work internally
    by using a Machine Learning library for every example. which really only means that if you want to figure out why the algorithms work you need to find another resource.

    Overall if you have a Data Science background I think this book is worth your time.

    (1 of 3 customers found this review helpful)

     
    2.0

    Only half an answer

    By Some guy

    from Portland, OR

    About Me Developer

    Verified Buyer

    Pros

      Cons

      • Not comprehensive enough
      • Not enough examples

      Best Uses

        Comments about oreilly Advanced Analytics with Spark:

        I bought this book entirely for the section on record linkage. But it starts by assuming the data is already in pairs, stored in files. Creating these pairs is one of the most crucial parts of record linkage, and while there may be a thousand ways to create blocks of pairs, an example of one of them would have been nice. It feels like the authors were being lazy and only talked about one of the easier steps in record linkage.

        (1 of 1 customers found this review helpful)

         
        5.0

        Simple about making Data Science with Spark and Scala

        By Alex

        from Bellevue, WA

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate
          • Novice

          Comments about oreilly Advanced Analytics with Spark:

          Just have this book finished. Interesting book, covers different aspects of data analysis. Each chapter is written as a separate use case and covers data mining process end-to-end: from data acquisition and preparation till results evaluation. I would say "THANKS" to authors - this way of organizing complex data science stuff is very helpful and easy to understand. The second great thing about this book - authors are describing in simple words used analysis method at the beginning of each chapter and give links for further reading about specific algorithm. The third good thing - this book is a blend of data science, Spark and Scala. So you read about how to analyze a data and thinking how to use Spark for this at the same time. That is really cool, plus you are training your Scala skills as well. Great book, would recommend to read.

          (1 of 2 customers found this review helpful)

           
          4.0

          A good start but it will require frequent updates.

          By hazznain

          from Helsinki, Finland

          Pros

          • Helpful examples

          Cons

          • Incosistent Writing Style

          Best Uses

          • Intermediate

          Comments about oreilly Advanced Analytics with Spark:

          A very good book for getting started with Spark MlLib. However Spark as a platform is developing very fast and the constant upgrades to the book are required to keep up with spark evolution.

          (0 of 2 customers found this review helpful)

           
          5.0

          Awesome book; I highly recommend.

          By alfredo

          from San Antonio, TX

          Verified Buyer

          Pros

          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Expert
            • Intermediate

            Comments about oreilly Advanced Analytics with Spark:

            Still digesting contents and loving it.

            (4 of 4 customers found this review helpful)

             
            5.0

            Great intro for

            By Nicos

            from San Francisco Bay Area

            About Me Developer

            Pros

            • Accurate
            • Concise
            • Helpful examples
            • Well-written

            Cons

            • Some Confusing Typos

            Best Uses

            • Expert
            • Intermediate

            Comments about oreilly Advanced Analytics with Spark:

            Great and thorough coverage of the basic and advanced concepts of data collection, data analysis, modeling and model deployment using Spark's data processing pipeline.

            If you want to get better understanding why Spark is so disruptive in Data Science and why it truly democratizes the whole Big Data space - this book is a great primer for that.

            It is always hard to write a book about Data Science and Machine Learning without being labeled as too shallow or too difficult to read. I think authors did good job in balancing advanced concepts explanation with their practical implementation. Having said that - get ready to reread the book many times. This is the nature of the field but not the authors' fault.

            There are some typos that may confuse the reader (like "terms matrix" instead of "documents matrix" at places). But they are easy to overcome. In fact those typos cause you pause and "actively" work the suggested algorithms.

            Disclaimer: this review is based on the early release

            (1 of 4 customers found this review helpful)

             
            5.0

            Great book

            By SzymonC

            from Warszawa

            About Me Developer, Educator

            Verified Buyer

            Pros

            • Well-written

            Cons

              Best Uses

              • Intermediate
              • Novice
              • Student

              Comments about oreilly Advanced Analytics with Spark:

              Great book about Spark. Just to the point

              (1 of 3 customers found this review helpful)

               
              4.0

              A must read book : Simple and precise

              By Vishnu

              from India

              About Me Developer

              Verified Buyer

              Pros

              • Concise
              • Helpful examples
              • Well-written

              Cons

              • Not comprehensive enough

              Best Uses

              • Intermediate

              Comments about oreilly Advanced Analytics with Spark:

              * It helps you understand how data science algorithms can be used in spark.

              * How to extract features .

              * It helps you to learn spark by experiments.

              Displaying reviews 1-8

              Back to top

               
              Buy 2 Get 1 Free Free Shipping Guarantee
              Buying Options
              Immediate Access - Go Digital what's this?
              Ebook:  $42.50
              Formats:  DAISY, ePub, Mobi, PDF
              Print & Ebook:  $54.99
              Print:  $49.99