High Performance Spark
Best Practices for Scaling and Optimizing Apache Spark
Publisher: O'Reilly Media
Final Release Date: March 2016
Pages: 336

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.

If you’ve successfully used Apache Spark to solve medium-sized problems, but still struggle to realize the "Spark promise" of unparalleled performance on big data, this book is for you. High Performance Spark shows you how take advantage of Spark at scale, so you can grow beyond the novice level. It’s ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications.

  • Learn how to make Spark jobs run faster
  • Productionize exploratory data science with Spark
  • Handle even larger data sets with Spark
  • Reduce pipeline running times for faster insights
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews


by PowerReviews
oreillyHigh Performance Spark

(based on 3 reviews)

Ratings Distribution

  • 5 Stars



  • 4 Stars



  • 3 Stars



  • 2 Stars



  • 1 Stars



Reviewed by 3 customers

Displaying reviews 1-3

Back to top

(2 of 2 customers found this review helpful)


I reference this book regularly

By OperationsGuy

from Palo Alto

About Me Sys Admin

Verified Reviewer

Comments about oreilly High Performance Spark:

Deep dive with easy-to-understand diagrams. Very helpful, especially on joins!
Looking forward to more from these authors, especially on testing.

(2 of 2 customers found this review helpful)


The best book on writing production-ready Spark code

By Ewan

from Manchester, UK

About Me Developer

Verified Reviewer


  • Accurate
  • Concise
  • Easy to understand
  • Helpful examples
  • Well-written


    Best Uses

    • Expert
    • Intermediate
    • Novice

    Comments about oreilly High Performance Spark:

    There are quite a few good books on getting started with Spark, launching the interactive shell, running a few queries, and so on, but this book is fairly unique in showing you the ways to get the best of the Spark programming APIs.

    The chapter on "Joins" covering RDD, DataFrame, and Dataset APIs will save you hours if not days of research alone.

    (0 of 10 customers found this review helpful)


    I would like to purchase this book

    By Srini

    from India

    Comments about oreilly High Performance Spark:

    I would like to purchase this book, but still its in the early release category. May i know when this would be ready with all topics.

    Does it cover java equivalent examples as well?

    What knowledge do we need to have, to understand the book?

    Displaying reviews 1-3

    Back to top

    Buy 2 Get 1 Free Free Shipping Guarantee
    Buying Options
    Immediate Access - Go Digital what's this?
    Pre-Order  Print:  $39.99
    May 2017 (est.)