Learning Spark
Lightning-Fast Big Data Analysis
Publisher: O'Reilly Media
Final Release Date: January 2015
Pages: 276

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

  • Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
  • Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
  • Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
  • Learn how to deploy interactive, batch, and streaming applications
  • Connect to data sources including HDFS, Hive, JSON, and S3
  • Master advanced topics like data partitioning and shared variables
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyLearning Spark
 
4.1

(based on 18 reviews)

Ratings Distribution

  • 5 Stars

     

    (6)

  • 4 Stars

     

    (8)

  • 3 Stars

     

    (3)

  • 2 Stars

     

    (1)

  • 1 Stars

     

    (0)

88%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (15)
  • Well-written (12)
  • Accurate (11)
  • Helpful examples (11)
  • Concise (10)

Cons

  • Not comprehensive enough (5)

Best Uses

  • Intermediate (13)
  • Novice (10)
  • Student (5)
    • Reviewer Profile:
    • Developer (12), Designer (3)

Reviewed by 18 customers

Sort by

Displaying reviews 1-10

Back to top

Previous | Next »

 
3.0

Not as good as I expected

By Tom

from Slovakia

Verified Reviewer

Pros

    Cons

    • Difficult to understand
    • Not comprehensive enough

    Best Uses

      Comments about oreilly Learning Spark:

      I decied to learn Spark from this book but after a while I realized that this book misses a real world comprehend example. Some use case, which can be started on the beginning with simple RDD transformations and continue to add more features like file operations and so on. The chapters are well organized but I missed python sample codes in some places, the samples were just a slices from a complete solution, which can be found on gitHub.

       
      4.0

      Good overview

      By Thierry H.

      from Montreal

      About Me Developer

      Verified Buyer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

      • Too basic

      Best Uses

      • Intermediate
      • Novice

      Comments about oreilly Learning Spark:

      Good overview of spark. For the size of the book, it is difficult to stuff better content in it. I just expected more material about inner workings of spark. The Tuning and Debugging chapter is way too light. It's often difficult to debug what's going wrong in spark. Ok we can follow jobs, stages and tasks in the WebUI but it's often not enough.

      (0 of 1 customers found this review helpful)

       
      5.0

      Great for Beginners!

      By sbalajis

      from Hackettstown, NJ

      About Me Sys Admin

      Verified Buyer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate
        • Novice
        • Student

        Comments about oreilly Learning Spark:

        Excellent guide for quick and precise learning.

        (5 of 5 customers found this review helpful)

         
        4.0

        Good intro, but update is needed

        By renodino

        from Reno, NV

        About Me Developer

        Verified Reviewer

        Pros

        • Easy to understand
        • Well-written

        Cons

        • Not comprehensive enough
        • Outdated After 1 Month

        Best Uses

        • Intermediate
        • Novice

        Comments about oreilly Learning Spark:

        Provides a good surface level introduction, but could use more robust examples, and maybe a deeper dive in some subject areas. Also, less than a month after the final release of the book, the new Spark 1.3 has invalidated many of the examples (esp. Spark SQL). Under those circumstances, I think updates to the ebook should be made available.

        (2 of 11 customers found this review helpful)

         
        2.0

        Major mistakes concerning windows suppor

        By Al

        from Phladelphia,USA

        Pros

        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate

          Comments about oreilly Learning Spark:

          I just browsed the book but right at the start authors claim that : Spark can be installed on any system with Java and python installed. I am sure they never tried to install a pre-built package for windows (there is non) a non of the pre-built packages works on windows (because of hadoop dependency).

          (4 of 10 customers found this review helpful)

           
          3.0

          Maybe Spark isn't for real data

          By iceback

          from SLC

          Verified Reviewer

          Pros

          • Concise
          • Easy to understand
          • Well-written

          Cons

          • Not comprehensive enough

          Best Uses

            Comments about oreilly Learning Spark:

            Definitely more thorough than most of the readily available examples out there, but really doesn't go much beyond. Maybe it's just me but certainly people are using Spark for things other than word count? Is "Big Data" really little more bloated collections independent strings?

             
            4.0

            Met my expectations

            By Emma

            from Spain

            About Me Developer

            Verified Buyer

            Pros

            • Accurate
            • Easy to understand

            Cons

            • Not comprehensive enough

            Best Uses

            • Intermediate

            Comments about oreilly Learning Spark:

            The perfect book to learn Apache Spark and get prepared for Spark Developer Certification.

            (2 of 2 customers found this review helpful)

             
            5.0

            if you are learning spark-read this book

            By just learning

            from Seattle, WA

            About Me Analyst, Developer

            Verified Reviewer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Novice
              • Student

              Comments about oreilly Learning Spark:

              There are many types of resources out there for learning spark, but Learning Spark pulls together what you really need to keep in mind as you develop. I had taken a Spark class and watched many videos, and I still needed this book to fill in some of the gaps

              I think it works as bridging material for both data scientist persona and software/engineer persona. The book manages to answer relevant practical questions which both will have while getting started with Spark. It does this in an extremely accessible and clear explanatory style.

              First you will learn the main abstractions of Spark, and its particulars. There are useful code examples in the 3 main API languages. Then you will begin to learn some of the more advanced features, as well as starting to develop a basic understanding about how Spark and Spark applications are administered and tuned for performance. The book is helpful in developing an appreciation for how a Spark cluster could be a unifying mixed-use platform, engaging various different personnel within an organization.

              In the final chapters you will get a small flavor of the parts of the stack which sit on top of Spark- MLLib, SparkSQL, Spark Streaming, GraphX. After reading this book, I feel prepared to continue practicing hands-on with Spark, and particularly to deeply understand many of the other materials which I have come across.

              Grateful for Learning Spark.

              (1 of 1 customers found this review helpful)

               
              5.0

              Using Spark? Buy this book.

              By Tony Duarte

              from Silicon Valley, CA

              About Me Developer, Educator

              Verified Buyer

              Pros

              • Accurate
              • Good for beginners
              • Helpful examples

              Cons

                Best Uses

                • Novice
                • Student

                Comments about oreilly Learning Spark:

                Being charitable, the official Spark documentation might be described as "sparse".

                So having a book such as this, which covers the basics, really helps. Of course, I wish there were more details - but I'm mostly just grateful that the book exists.

                (8 of 10 customers found this review helpful)

                 
                4.0

                Review of Learning Spark

                By Arun

                from San Jose, CA

                About Me Developer

                Verified Buyer

                Pros

                • Accurate
                • Concise
                • Easy to understand

                Cons

                • Not comprehensive enough

                Best Uses

                • Intermediate

                Comments about oreilly Learning Spark:

                I am still reading the book so these preliminary comments. I'll continue to add comments as I read more chapters and more chapters become available.

                The biggest shortcoming is the lack of Java 8 examples. Java 8 is gaining rapid adoption and when the book comes out in Feb 2015, it will be the preferred way of computing with Spark in Java. Here are the suggestions in preferred order:

                1. Include Java 8 examples along with Java 7 examples in the book. They will not take much space since they will be as compact as the Python examples.
                2. If the Java 8 examples are not in the book, all the examples with their Java 8 equivalents should be made available on Github *on the day the book is released*.

                Displaying reviews 1-10

                Back to top

                Previous | Next »

                 
                Buy 2 Get 1 Free Free Shipping Guarantee
                Buying Options
                Immediate Access - Go Digital what's this?
                Ebook: $33.99
                Formats:  DAISY, ePub, Mobi, PDF
                Print & Ebook: $43.99
                Print: $39.99