Learning Spark
Lightning-Fast Big Data Analytics
Publisher: O'Reilly Media
Final Release Date: June 2014
Pages: 300

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.

The Web is getting faster, and the data it delivers is getting bigger. How can you handle everything efficiently? This book introduces Spark, an open source cluster computing system that makes data analytics fast to run and fast to write. You’ll learn how to run programs faster, using primitives for in-memory cluster computing. With Spark, your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce.

Written by the developers of Spark, this book will have you up and running in no time. You’ll learn how to express MapReduce jobs with just a few simple lines of Spark code, instead of spending extra time and effort working with Hadoop’s raw Java API.

  • Quickly dive into Spark capabilities such as collect, count, reduce, and save
  • Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
  • Learn how to run interactive, iterative, and incremental analyses
  • Integrate with Scala to manipulate distributed datasets like local collections
  • Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
  • Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyLearning Spark
 
4.2

(based on 9 reviews)

Ratings Distribution

  • 5 Stars

     

    (3)

  • 4 Stars

     

    (5)

  • 3 Stars

     

    (1)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

78%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (8)
  • Accurate (6)
  • Helpful examples (6)
  • Well-written (6)
  • Concise (5)

Cons

    Best Uses

    • Intermediate (8)
    • Novice (5)
      • Reviewer Profile:
      • Developer (7), Designer (3)

    Reviewed by 9 customers

    Sort by

    Displaying reviews 1-9

    Back to top

    (2 of 2 customers found this review helpful)

     
    4.0

    Review of Learning Spark

    By Arun

    from San Jose, CA

    About Me Developer

    Verified Buyer

    Pros

    • Accurate
    • Concise
    • Easy to understand

    Cons

    • Not comprehensive enough

    Best Uses

    • Intermediate

    Comments about oreilly Learning Spark:

    I am still reading the book so these preliminary comments. I'll continue to add comments as I read more chapters and more chapters become available.

    The biggest shortcoming is the lack of Java 8 examples. Java 8 is gaining rapid adoption and when the book comes out in Feb 2015, it will be the preferred way of computing with Spark in Java. Here are the suggestions in preferred order:

    1. Include Java 8 examples along with Java 7 examples in the book. They will not take much space since they will be as compact as the Python examples.
    2. If the Java 8 examples are not in the book, all the examples with their Java 8 equivalents should be made available on Github *on the day the book is released*.

    (1 of 1 customers found this review helpful)

     
    3.0

    Good book for beginners

    By Tarun

    from San Francisco, CA

    About Me Designer, Developer

    Verified Buyer

    Pros

    • Accurate
    • Easy to understand
    • Helpful examples

    Cons

    • Too basic

    Best Uses

    • Intermediate

    Comments about oreilly Learning Spark:

    Book is good, but i am expecting more in that may be because this is the only book available in the market.

    I am looking for:
    1. More examples.
    2. Api level description
    3. Best practices (if any)

     
    5.0

    Thank You, Thank You, Thank You

    By 2bz4SQL

    from SOMA & Davis

    About Me Big Data Architect

    Verified Buyer

    Pros

    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice

      Comments about oreilly Learning Spark:

      Spark is such a fast moving target that finding relevant, non-obsolete advice & examples is a difficult task. This book has really made this task much simpler. I have a much better understanding of the concepts now and this has really helped me to add Spark to an existing Cassandra project. I look forward to the additional chapters.

      I have downloaded just about every piece of documentation from the Databrix site, and watched just about every webinar or powerpoint slide that I could find - and this book has really helped to fill in the gaps - and to help me to understand the finer points of the excellent DataBrix presentations from Paco & the rest.

       
      5.0

      A great way to jump into Apache Spark

      By dennyglee

      from Seattle, WA

      About Me Developer

      Verified Buyer

      Pros

      • Accurate
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate
        • Novice
        • Student

        Comments about oreilly Learning Spark:

        While it is early release, this is the book you will want to get to understand how to work with Apache Spark. As noted in previous reviews, you can find most of the concepts available online but its great to have one reference to cover all of the concepts. Looking forward to more chapters!

        (1 of 1 customers found this review helpful)

         
        4.0

        Definitive guide to Spark

        By Jacek Laskowski - Spark newcomer

        from Warsaw, Poland

        About Me Developer, Educator

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate
          • Novice

          Comments about oreilly Learning Spark:

          Read the book if you're curious about Apache Spark and are on the lookout for a more systematic approach to learn its features. Even at this writing stage can the book be very useful for newcomers to the field as well as people more experienced.

          With a few weeks of learning Spark under my belt I needed a book to overcome initial hurdles and reach higher level of confidence in applying Spark where it'd fit well. I simply needed a mentor who'd guide me through "what, when, how" of Spark and the book did that far beyond my expectations.

          There are already 5 chapters of quality that I didn't expect from a book in an early release – I must admit that the content's polished and after having read the chapters I need more of it. The book's written by people who are the committers of the project and their writing style is very engaging with enough theory and code samples in Java, Scala and Python. There are many use cases for which Spark is a valid software offering and I'm in no way to imagine how my Spark skills will have grown up after the other chapters yet to come like Advanced Programming with RDDs, Spark Architecture and Deploying Spark. It's undoubtedly going to be a painful experience waiting for them to show up.

          If the 5 chapters (out of 13 planned) were any indication of what the book's going to look like in the final version, I'm fully confident of its success – it's going to be the bestseller in the area of Big Data Analytics. No programming language – out of Java, Scala or Python – is favoured. As the authors pointed out in the initial pages, they're going to show examples of using Spark in the three programming languages and they're doing it for each and every use case. That's also one of the selling points of Spark that the book highlights very well – the samples are simple to comprehend, almost no-brainers, and can easily fit a page, even in all three languages. Without Spark the samples would not have been so easy to implement and would've required much more from the implementer, be it an engineer or data analyst. The book demonstrates it well.

          As we're at it, the two job titles – a software engineer and a data analyst – are the people the book targets. It's just this book that has helped me to notice the difference between them and how Spark blends their needs into a single software offering. After the 5 chapters Spark appears so simple that I doubt there's anything that can surprise me that would not be a bug or an intended (yet surprising) feature. The book has helped me to build confidence in understanding the benefits of using Spark in my project and I'm really looking forward to reading the remaining chapters. I'm hoping that the authors and the publisher won't let me wait too long.

           
          4.0

          A must read

          By Olivier NOUGUIER

          from Montpellier, France

          About Me Developer

          Verified Buyer

          Pros

          • Accurate
          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate

            Comments about oreilly Learning Spark:

            Great content, very well written, I'm longing for the next chapters !!!

            (2 of 2 customers found this review helpful)

             
            4.0

            Excellent so far

            By Ramesh M

            from Indianapolis, IN

            About Me Designer, Developer

            Verified Buyer

            Pros

            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Intermediate
              • Novice
              • Student

              Comments about oreilly Learning Spark:

              Even though spark programing basic concepts and examples can be found at Internet, reading a book with structured format is easy and less time consuming. Author has done great job by explaining the spark functions with examples. That allowed me to get better understanding on basic concepts easily with examples and start applying on my project.

              I'm looking forward to have next set of early access.

               
              5.0

              Brilliant intro to Spark

              By Helipilot50

              from Dallas, TX, USA

              About Me Designer, Developer, Maker, Seasoned It Professional

              Verified Buyer

              Pros

              • Accurate
              • Concise
              • Easy to understand
              • Well-written

              Cons

              • Needs To Be Finished

              Best Uses

              • Intermediate
              • Novice

              Comments about oreilly Learning Spark:

              Spark is a credible compute engine that scales massively. This book not only assists you in getting started with Spark, but also helps you to adjust your thinking so you can completely exploit the parallel processing. If you have a need for Complex Event Processing, or "as it happens" processing rather that using the batch processing of Hadoop, then Spark is your solution.

              (0 of 6 customers found this review helpful)

               
              4.0

              Really cool

              By Drodri

              from Barcelona

              Verified Buyer

              Comments about oreilly Learning Spark:

              waiting for new update !

              Displaying reviews 1-9

              Back to top

               
              Buy 2 Get 1 Free Free Shipping Guarantee
              Buying Options
              Immediate Access - Go Digital what's this?
              Pre-Order  Print: $39.99
              February 2015 (est.)