Learning Spark
Lightning-Fast Big Data Analysis
Publisher: O'Reilly Media
Final Release Date: January 2015
Pages: 276

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

  • Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
  • Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
  • Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
  • Learn how to deploy interactive, batch, and streaming applications
  • Connect to data sources including HDFS, Hive, JSON, and S3
  • Master advanced topics like data partitioning and shared variables
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyLearning Spark
 
3.9

(based on 19 reviews)

Ratings Distribution

  • 5 Stars

     

    (6)

  • 4 Stars

     

    (8)

  • 3 Stars

     

    (3)

  • 2 Stars

     

    (2)

  • 1 Stars

     

    (0)

82%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (16)
  • Well-written (12)
  • Accurate (11)
  • Helpful examples (11)
  • Concise (10)

Cons

  • Not comprehensive enough (5)
  • Too basic (3)

Best Uses

  • Intermediate (13)
  • Novice (10)
  • Student (5)
    • Reviewer Profile:
    • Developer (12), Designer (3)

Reviewed by 19 customers

Sort by

Displaying reviews 1-10

Back to top

Previous | Next »

(1 of 1 customers found this review helpful)

 
2.0

A decent guided tour of Spark and its major components.

By Jascha

from Barcelona

Verified Reviewer

Pros

  • Easy to understand

Cons

  • Too basic

Best Uses

    Comments about oreilly Learning Spark:

    Over the last few years Big Data has gathered an incredible amount of momentum. All this fuzz and buzz resulted in top companies, as well as fearless start-ups, to invest hours and cash in data solutions, some of which have emerged, establishing new standards. Having the spotlight on often resulted in these projects turning into open source ones. Among these , Spark, a cluster computing framework, recently adopted by the Apache Foundation. Despite being a hot topic of this 2015, the literature dedicated to the subject is still very limited. Among the few titles available, Learning Spark provides the curious reader with a decent overview of the major features provided by the framework.

    Written by a groups of enthusiasts and developers, including the original creator of the framework itself, Matei, Learning Spark targets data scientists and engineers. As expressly written on the back cover, this book is neither a reference nor a cookbook. Its goal is to presents a different, faster alternative to the Hadoop's Map/Reduce paradigm and to the elephant made in Apache itself.

    The reader is given a quick overview of the capabilities of the framework, such as the built-in libraries, Spark SQL and the many different data sources it can interact with. While not all the main features are presented, those that are found within these almost three-hundreds pages come with plenty of well explained examples.

    The examples are, on the other hand, one of the many perplexities raised by this text: each is presented in Python, Java and Scala. While it is great to see many different bindings in action, any average skilled Pythonist can easily understand what happens in Java . And vice versa. This is even more true in the case of Scala, another most wanted topic of the recent years, inevitably related to Java and its ecosystem.

    Another thumb down for the complete absence of anything related to the Spark's internal architecture. The car looks nice, but what about the engine? How does it work? Magic? Witchery?

    Again, the examples presented are clear and well explained, but there is no real world case shown. Spark is meant to get executed on huge clusters with scary amounts of data. True, this is a quick overview of the product, but "hello world" per se does not make me wanna learn more.

    Overall, a good read for that early morning hour of commute. It helps the curious reader to pickup the basics of the framework. On the other hand, nothing of what is presented can't be found in the web pages of the Apache Software Foundation.

    As usual, you can find more reviews on my personal blog: http://books.lostinmalloc.com Feel free to pass by and share your thoughts!

    (2 of 2 customers found this review helpful)

     
    3.0

    Not as good as I expected

    By Tom

    from Slovakia

    Verified Reviewer

    Pros

      Cons

      • Difficult to understand
      • Not comprehensive enough

      Best Uses

        Comments about oreilly Learning Spark:

        I decied to learn Spark from this book but after a while I realized that this book misses a real world comprehend example. Some use case, which can be started on the beginning with simple RDD transformations and continue to add more features like file operations and so on. The chapters are well organized but I missed python sample codes in some places, the samples were just a slices from a complete solution, which can be found on gitHub.

         
        4.0

        Good overview

        By Thierry H.

        from Montreal

        About Me Developer

        Verified Buyer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

        • Too basic

        Best Uses

        • Intermediate
        • Novice

        Comments about oreilly Learning Spark:

        Good overview of spark. For the size of the book, it is difficult to stuff better content in it. I just expected more material about inner workings of spark. The Tuning and Debugging chapter is way too light. It's often difficult to debug what's going wrong in spark. Ok we can follow jobs, stages and tasks in the WebUI but it's often not enough.

        (0 of 1 customers found this review helpful)

         
        5.0

        Great for Beginners!

        By sbalajis

        from Hackettstown, NJ

        About Me Sys Admin

        Verified Buyer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate
          • Novice
          • Student

          Comments about oreilly Learning Spark:

          Excellent guide for quick and precise learning.

          (5 of 5 customers found this review helpful)

           
          4.0

          Good intro, but update is needed

          By renodino

          from Reno, NV

          About Me Developer

          Verified Reviewer

          Pros

          • Easy to understand
          • Well-written

          Cons

          • Not comprehensive enough
          • Outdated After 1 Month

          Best Uses

          • Intermediate
          • Novice

          Comments about oreilly Learning Spark:

          Provides a good surface level introduction, but could use more robust examples, and maybe a deeper dive in some subject areas. Also, less than a month after the final release of the book, the new Spark 1.3 has invalidated many of the examples (esp. Spark SQL). Under those circumstances, I think updates to the ebook should be made available.

          (2 of 11 customers found this review helpful)

           
          2.0

          Major mistakes concerning windows suppor

          By Al

          from Phladelphia,USA

          Pros

          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate

            Comments about oreilly Learning Spark:

            I just browsed the book but right at the start authors claim that : Spark can be installed on any system with Java and python installed. I am sure they never tried to install a pre-built package for windows (there is non) a non of the pre-built packages works on windows (because of hadoop dependency).

            (4 of 10 customers found this review helpful)

             
            3.0

            Maybe Spark isn't for real data

            By iceback

            from SLC

            Verified Reviewer

            Pros

            • Concise
            • Easy to understand
            • Well-written

            Cons

            • Not comprehensive enough

            Best Uses

              Comments about oreilly Learning Spark:

              Definitely more thorough than most of the readily available examples out there, but really doesn't go much beyond. Maybe it's just me but certainly people are using Spark for things other than word count? Is "Big Data" really little more bloated collections independent strings?

               
              4.0

              Met my expectations

              By Emma

              from Spain

              About Me Developer

              Verified Buyer

              Pros

              • Accurate
              • Easy to understand

              Cons

              • Not comprehensive enough

              Best Uses

              • Intermediate

              Comments about oreilly Learning Spark:

              The perfect book to learn Apache Spark and get prepared for Spark Developer Certification.

              (2 of 2 customers found this review helpful)

               
              5.0

              if you are learning spark-read this book

              By just learning

              from Seattle, WA

              About Me Analyst, Developer

              Verified Reviewer

              Pros

              • Accurate
              • Concise
              • Easy to understand
              • Helpful examples
              • Well-written

              Cons

                Best Uses

                • Novice
                • Student

                Comments about oreilly Learning Spark:

                There are many types of resources out there for learning spark, but Learning Spark pulls together what you really need to keep in mind as you develop. I had taken a Spark class and watched many videos, and I still needed this book to fill in some of the gaps

                I think it works as bridging material for both data scientist persona and software/engineer persona. The book manages to answer relevant practical questions which both will have while getting started with Spark. It does this in an extremely accessible and clear explanatory style.

                First you will learn the main abstractions of Spark, and its particulars. There are useful code examples in the 3 main API languages. Then you will begin to learn some of the more advanced features, as well as starting to develop a basic understanding about how Spark and Spark applications are administered and tuned for performance. The book is helpful in developing an appreciation for how a Spark cluster could be a unifying mixed-use platform, engaging various different personnel within an organization.

                In the final chapters you will get a small flavor of the parts of the stack which sit on top of Spark- MLLib, SparkSQL, Spark Streaming, GraphX. After reading this book, I feel prepared to continue practicing hands-on with Spark, and particularly to deeply understand many of the other materials which I have come across.

                Grateful for Learning Spark.

                (1 of 1 customers found this review helpful)

                 
                5.0

                Using Spark? Buy this book.

                By Tony Duarte

                from Silicon Valley, CA

                About Me Developer, Educator

                Verified Buyer

                Pros

                • Accurate
                • Good for beginners
                • Helpful examples

                Cons

                  Best Uses

                  • Novice
                  • Student

                  Comments about oreilly Learning Spark:

                  Being charitable, the official Spark documentation might be described as "sparse".

                  So having a book such as this, which covers the basics, really helps. Of course, I wish there were more details - but I'm mostly just grateful that the book exists.

                  Displaying reviews 1-10

                  Back to top

                  Previous | Next »

                   
                  Buy 2 Get 1 Free Free Shipping Guarantee
                  Buying Options
                  Immediate Access - Go Digital what's this?
                  Ebook: $33.99
                  Formats:  DAISY, ePub, Mobi, PDF
                  Print & Ebook: $43.99
                  Print: $39.99