Learning Spark
Lightning-Fast Big Data Analysis
Publisher: O'Reilly Media
Final Release Date: January 2015
Pages: 276

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

  • Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
  • Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
  • Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
  • Learn how to deploy interactive, batch, and streaming applications
  • Connect to data sources including HDFS, Hive, JSON, and S3
  • Master advanced topics like data partitioning and shared variables
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyLearning Spark
 
4.0

(based on 26 reviews)

Ratings Distribution

  • 5 Stars

     

    (9)

  • 4 Stars

     

    (11)

  • 3 Stars

     

    (4)

  • 2 Stars

     

    (2)

  • 1 Stars

     

    (0)

79%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (20)
  • Well-written (16)
  • Accurate (15)
  • Helpful examples (15)
  • Concise (12)

Cons

  • Not comprehensive enough (7)
  • Too basic (5)

Best Uses

  • Intermediate (17)
  • Novice (15)
  • Student (9)
    • Reviewer Profile:
    • Developer (17), Designer (4)

Reviewed by 26 customers

Displaying reviews 1-10

Back to top

Previous | Next »

(2 of 2 customers found this review helpful)

 
5.0

Very recommended

By jenny412

from Dublin, Ireland

About Me Developer

Pros

  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Novice
    • Student

    Comments about oreilly Learning Spark:

    I totally recommend this book. I tried to solve a programming problem for a week and with this book I understood how to fix it in about half an hour. Loved it.

    (0 of 2 customers found this review helpful)

     
    5.0

    Great

    By sumit

    from France

    Verified Buyer

    Comments about oreilly Learning Spark:

    It is good.

     
    4.0

    Good book to learn Spark

    By crorella

    from Sunnyvale, CA

    About Me Developer

    Verified Reviewer

    Pros

    • Accurate

    Cons

    • Too basic

    Best Uses

    • Novice
    • Student

    Comments about oreilly Learning Spark:

    A good introductory book.
    The only problem is the project is moving very fast and some topics in the book are already outdated while other new are not included (such as DataFrames or DataSets)

     
    4.0

    Excellent book for beginners/intermediate Spark developers

    By RJ

    from Boston MA

    About Me Designer, Developer, Sys Admin

    Verified Reviewer

    Pros

    • Accurate
    • Well-written

    Cons

    • Difficult to understand

    Best Uses

    • Intermediate
    • Novice
    • Student

    Comments about oreilly Learning Spark:

    Over all very nicely written , the examples are provided in 3 languages Scala,Java and Python. Covers a wide array of topics , perhaps they could have gone deep in some areas but still okay , gives you enough to get started.

    If you are looking for real world examples the follow on book, Adv Spark is a good read .

    This book also does not cover the architecture in great detail and perhaps they could have done a better Job organizing the Topics especially around the Physical Architecture a bit better . There is references to Executors and Block Manager in prior chapters before these services are introduced .

     
    5.0

    Very good Spark learing source

    By Narcis Stefanescu

    from Bucharest, Romania

    About Me Developer

    Verified Buyer

    Pros

    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

    • Not comprehensive enough

    Best Uses

    • Intermediate
    • Novice
    • Student

    Comments about oreilly Learning Spark:

    I find this book as being a very good point for start to learn Spark. The authors approach to make a structured presentation over the framework I find very useful in order to go into more domain speciffic, detailed examples.

    (1 of 1 customers found this review helpful)

     
    4.0

    Great Starting Resource - Hadoop Knowledge Helps

    By Will J

    from Milwaukee, WI

    About Me Analyst

    Verified Reviewer

    Pros

    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate

      Comments about oreilly Learning Spark:

      With a small amount of hadoop knowledge, you're ready to learn Spark. I used this book to get a basic understanding of the syntax of Spark.

      My only qualm was that I would like to have more case studies / full code examples. There are plenty of code snippets but I wish it had some complete examples (connect, load, process, save).

      (2 of 2 customers found this review helpful)

       
      3.0

      Rehash of the website

      By Ranga

      from India

      About Me Developer

      Verified Buyer

      Pros

      • Accurate
      • Easy to understand
      • Helpful examples

      Cons

      • Introductory
      • Not comprehensive enough
      • Too basic

      Best Uses

      • Novice

      Comments about oreilly Learning Spark:

      Content in the book is obtainable from the Spark website itself. So far, no topic struck me as being novel or dealt with in a detailed way. What I would have loved to see being explained is the internals. For instance, I was keen on understanding how checkpointing worked, what is checkpointed, and at what frequency. Missing. I was then looking for a concrete example of how Kafka Direct API in Spark Streaming worked and specifically how to get the information on the current topic being consumed by each executor. Missing, in fact found a blog post on Cloudera more useful than the discussion in this book. Then, I was looking for HBase Connectivity examples, missing. If you want to use Spark in a practical enterprise setting with ecosystem integration etc, be prepared for a lot more time being spent on the Internet searching for things. If you want to understand the internals of Spark, again you will be doing more Google searches and forum perusals than finding yourself going back to this book. It merely serves the purpose of an introduction, which is a big letdown given the author list.

      (3 of 5 customers found this review helpful)

       
      2.0

      A decent guided tour of Spark and its major components.

      By Jascha

      from Barcelona

      Verified Reviewer

      Pros

      • Easy to understand

      Cons

      • Too basic

      Best Uses

        Comments about oreilly Learning Spark:

        Over the last few years Big Data has gathered an incredible amount of momentum. All this fuzz and buzz resulted in top companies, as well as fearless start-ups, to invest hours and cash in data solutions, some of which have emerged, establishing new standards. Having the spotlight on often resulted in these projects turning into open source ones. Among these , Spark, a cluster computing framework, recently adopted by the Apache Foundation. Despite being a hot topic of this 2015, the literature dedicated to the subject is still very limited. Among the few titles available, Learning Spark provides the curious reader with a decent overview of the major features provided by the framework.

        Written by a groups of enthusiasts and developers, including the original creator of the framework itself, Matei, Learning Spark targets data scientists and engineers. As expressly written on the back cover, this book is neither a reference nor a cookbook. Its goal is to presents a different, faster alternative to the Hadoop's Map/Reduce paradigm and to the elephant made in Apache itself.

        The reader is given a quick overview of the capabilities of the framework, such as the built-in libraries, Spark SQL and the many different data sources it can interact with. While not all the main features are presented, those that are found within these almost three-hundreds pages come with plenty of well explained examples.

        The examples are, on the other hand, one of the many perplexities raised by this text: each is presented in Python, Java and Scala. While it is great to see many different bindings in action, any average skilled Pythonist can easily understand what happens in Java . And vice versa. This is even more true in the case of Scala, another most wanted topic of the recent years, inevitably related to Java and its ecosystem.

        Another thumb down for the complete absence of anything related to the Spark's internal architecture. The car looks nice, but what about the engine? How does it work? Magic? Witchery?

        Again, the examples presented are clear and well explained, but there is no real world case shown. Spark is meant to get executed on huge clusters with scary amounts of data. True, this is a quick overview of the product, but "hello world" per se does not make me wanna learn more.

        Overall, a good read for that early morning hour of commute. It helps the curious reader to pickup the basics of the framework. On the other hand, nothing of what is presented can't be found in the web pages of the Apache Software Foundation.

        As usual, you can find more reviews on my personal blog: http://books.lostinmalloc.com Feel free to pass by and share your thoughts!

        (3 of 4 customers found this review helpful)

         
        3.0

        Not as good as I expected

        By Tom

        from Slovakia

        Verified Reviewer

        Pros

          Cons

          • Difficult to understand
          • Not comprehensive enough

          Best Uses

            Comments about oreilly Learning Spark:

            I decied to learn Spark from this book but after a while I realized that this book misses a real world comprehend example. Some use case, which can be started on the beginning with simple RDD transformations and continue to add more features like file operations and so on. The chapters are well organized but I missed python sample codes in some places, the samples were just a slices from a complete solution, which can be found on gitHub.

            (1 of 1 customers found this review helpful)

             
            4.0

            Good overview

            By Thierry H.

            from Montreal

            About Me Developer

            Verified Buyer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

            • Too basic

            Best Uses

            • Intermediate
            • Novice

            Comments about oreilly Learning Spark:

            Good overview of spark. For the size of the book, it is difficult to stuff better content in it. I just expected more material about inner workings of spark. The Tuning and Debugging chapter is way too light. It's often difficult to debug what's going wrong in spark. Ok we can follow jobs, stages and tasks in the WebUI but it's often not enough.

            Displaying reviews 1-10

            Back to top

            Previous | Next »

             
            Buy 2 Get 1 Free Free Shipping Guarantee
            Buying Options
            Immediate Access - Go Digital what's this?
            Ebook:  $33.99
            Formats:  DAISY, ePub, Mobi, PDF
            Print & Ebook:  $43.99
            Print:  $39.99