High Performance Spark
Best Practices for Scaling and Optimizing Apache Spark
Publisher: O'Reilly Media
Final Release Date: May 2017
Pages: 358

Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing.

With this book, you’ll explore:

  • How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure
  • The choice between data joins in Core Spark and Spark SQL
  • Techniques for getting the most out of standard RDD transformations
  • How to work around performance issues in Spark’s key/value pair paradigm
  • Writing high-performance Spark code without Scala or the JVM
  • How to test for functionality and performance when applying suggested improvements
  • Using Spark MLlib and Spark ML machine learning libraries
  • Spark’s Streaming components and external community packages
Table of Contents
Product Details
About the Author
Colophon
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyHigh Performance Spark
 
5.0

(based on 3 reviews)

Ratings Distribution

  • 5 Stars

     

    (3)

  • 4 Stars

     

    (0)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Reviewed by 3 customers

Displaying reviews 1-3

Back to top

 
5.0

Super easy to read and learned a ton

By nickd-lucidworks

from san francisco, ca

About Me Developer

Verified Reviewer

Pros

  • Easy to understand

Cons

    Best Uses

    • Expert
    • Intermediate

    Comments about oreilly High Performance Spark:

    This book was to the point and was very easy to read. No fluff you will burn through the pages super fast. I'd recommend it +1 +1 +1

    (3 of 3 customers found this review helpful)

     
    5.0

    I reference this book regularly

    By OperationsGuy

    from Palo Alto

    About Me Sys Admin

    Verified Reviewer

    Comments about oreilly High Performance Spark:

    Deep dive with easy-to-understand diagrams. Very helpful, especially on joins!
    Looking forward to more from these authors, especially on testing.

    (3 of 3 customers found this review helpful)

     
    5.0

    The best book on writing production-ready Spark code

    By Ewan

    from Manchester, UK

    About Me Developer

    Verified Reviewer

    Pros

    • Accurate
    • Concise
    • Easy to understand
    • Helpful examples
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate
      • Novice

      Comments about oreilly High Performance Spark:

      There are quite a few good books on getting started with Spark, launching the interactive shell, running a few queries, and so on, but this book is fairly unique in showing you the ways to get the best of the Spark programming APIs.

      The chapter on "Joins" covering RDD, DataFrame, and Dataset APIs will save you hours if not days of research alone.

      Displaying reviews 1-3

      Back to top

       
      Buy 2 Get 1 Free Free Shipping Guarantee
      Buying Options
      Immediate Access - Go Digital what's this?
      Ebook:  $39.99
      Formats:  DAISY, ePub, Mobi, PDF
      Print & Ebook:  $43.99
      Print:  $39.99