Learning Spark
Lightning-Fast Big Data Analytics
Publisher: O'Reilly Media
Final Release Date: June 2014
Pages: 300

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.

The Web is getting faster, and the data it delivers is getting bigger. How can you handle everything efficiently? This book introduces Spark, an open source cluster computing system that makes data analytics fast to run and fast to write. You’ll learn how to run programs faster, using primitives for in-memory cluster computing. With Spark, your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce.

Written by the developers of Spark, this book will have you up and running in no time. You’ll learn how to express MapReduce jobs with just a few simple lines of Spark code, instead of spending extra time and effort working with Hadoop’s raw Java API.

  • Quickly dive into Spark capabilities such as collect, count, reduce, and save
  • Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
  • Learn how to run interactive, iterative, and incremental analyses
  • Integrate with Scala to manipulate distributed datasets like local collections
  • Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
  • Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyLearning Spark
 
4.3

(based on 6 reviews)

Ratings Distribution

  • 5 Stars

     

    (2)

  • 4 Stars

     

    (4)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

83%

of respondents would recommend this to a friend.

Pros

  • Easy to understand (5)
  • Well-written (5)
  • Accurate (4)
  • Concise (4)
  • Helpful examples (4)

Cons

    Best Uses

    • Intermediate (5)
    • Novice (4)
      • Reviewer Profile:
      • Developer (5)

    Reviewed by 6 customers

    Sort by

    Displaying reviews 1-6

    Back to top

     
    5.0

    A great way to jump into Apache Spark

    By dennyglee

    from Seattle, WA

    About Me Developer

    Verified Buyer

    Pros

    • Accurate
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Intermediate
      • Novice
      • Student

      Comments about oreilly Learning Spark:

      While it is early release, this is the book you will want to get to understand how to work with Apache Spark. As noted in previous reviews, you can find most of the concepts available online but its great to have one reference to cover all of the concepts. Looking forward to more chapters!

       
      4.0

      Definitive guide to Spark

      By Jacek Laskowski - Spark newcomer

      from Warsaw, Poland

      About Me Developer, Educator

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Easy to understand
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Intermediate
        • Novice

        Comments about oreilly Learning Spark:

        Read the book if you're curious about Apache Spark and are on the lookout for a more systematic approach to learn its features. Even at this writing stage can the book be very useful for newcomers to the field as well as people more experienced.

        With a few weeks of learning Spark under my belt I needed a book to overcome initial hurdles and reach higher level of confidence in applying Spark where it'd fit well. I simply needed a mentor who'd guide me through "what, when, how" of Spark and the book did that far beyond my expectations.

        There are already 5 chapters of quality that I didn't expect from a book in an early release – I must admit that the content's polished and after having read the chapters I need more of it. The book's written by people who are the committers of the project and their writing style is very engaging with enough theory and code samples in Java, Scala and Python. There are many use cases for which Spark is a valid software offering and I'm in no way to imagine how my Spark skills will have grown up after the other chapters yet to come like Advanced Programming with RDDs, Spark Architecture and Deploying Spark. It's undoubtedly going to be a painful experience waiting for them to show up.

        If the 5 chapters (out of 13 planned) were any indication of what the book's going to look like in the final version, I'm fully confident of its success – it's going to be the bestseller in the area of Big Data Analytics. No programming language – out of Java, Scala or Python – is favoured. As the authors pointed out in the initial pages, they're going to show examples of using Spark in the three programming languages and they're doing it for each and every use case. That's also one of the selling points of Spark that the book highlights very well – the samples are simple to comprehend, almost no-brainers, and can easily fit a page, even in all three languages. Without Spark the samples would not have been so easy to implement and would've required much more from the implementer, be it an engineer or data analyst. The book demonstrates it well.

        As we're at it, the two job titles – a software engineer and a data analyst – are the people the book targets. It's just this book that has helped me to notice the difference between them and how Spark blends their needs into a single software offering. After the 5 chapters Spark appears so simple that I doubt there's anything that can surprise me that would not be a bug or an intended (yet surprising) feature. The book has helped me to build confidence in understanding the benefits of using Spark in my project and I'm really looking forward to reading the remaining chapters. I'm hoping that the authors and the publisher won't let me wait too long.

         
        4.0

        A must read

        By Olivier NOUGUIER

        from Montpellier, France

        About Me Developer

        Verified Buyer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Helpful examples
        • Well-written

        Cons

          Best Uses

          • Intermediate

          Comments about oreilly Learning Spark:

          Great content, very well written, I'm longing for the next chapters !!!

          (2 of 2 customers found this review helpful)

           
          4.0

          Excellent so far

          By Ramesh M

          from Indianapolis, IN

          About Me Designer, Developer

          Verified Buyer

          Pros

          • Concise
          • Easy to understand
          • Helpful examples
          • Well-written

          Cons

            Best Uses

            • Intermediate
            • Novice
            • Student

            Comments about oreilly Learning Spark:

            Even though spark programing basic concepts and examples can be found at Internet, reading a book with structured format is easy and less time consuming. Author has done great job by explaining the spark functions with examples. That allowed me to get better understanding on basic concepts easily with examples and start applying on my project.

            I'm looking forward to have next set of early access.

             
            5.0

            Brilliant intro to Spark

            By Helipilot50

            from Dallas, TX, USA

            About Me Designer, Developer, Maker, Seasoned It Professional

            Verified Buyer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Well-written

            Cons

            • Needs To Be Finished

            Best Uses

            • Intermediate
            • Novice

            Comments about oreilly Learning Spark:

            Spark is a credible compute engine that scales massively. This book not only assists you in getting started with Spark, but also helps you to adjust your thinking so you can completely exploit the parallel processing. If you have a need for Complex Event Processing, or "as it happens" processing rather that using the batch processing of Hadoop, then Spark is your solution.

            (0 of 6 customers found this review helpful)

             
            4.0

            Really cool

            By Drodri

            from Barcelona

            Verified Buyer

            Comments about oreilly Learning Spark:

            waiting for new update !

            Displaying reviews 1-6

            Back to top

             
            Buy 2 Get 1 Free Free Shipping Guarantee
            Buying Options
            Immediate Access - Go Digital what's this?
            Pre-Order  Print: $39.99
            February 2015 (est.)