Skip to main content

Get full access to Mastering Spark with R and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Mastering Spark with R

Mastering Spark with R

by Javier Luraschi, Kevin Kuo, Edgar Ruiz

Released October 2019

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781492046370

Buy on Amazon Buy on ebooks.com

Start your free trial

Book description

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.

Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.

Analyze, explore, transform, and visualize data in Apache Spark with R
Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
Perform analysis and modeling across many machines using distributed computing techniques
Use large-scale data from multiple sources and different formats with ease from within Spark
Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

Publisher resources

View/Submit Errata

Table of contents

Foreword
Preface
1. Introduction
1. Overview
2. Hadoop
3. Spark
4. R
5. sparklyr
6. Recap
2. Getting Started
1. Overview
2. Prerequisites
  1. Installing sparklyr
  2. Installing Spark
3. Connecting
4. Using Spark
  1. Web Interface
  2. Analysis
  3. Modeling
  4. Data
  5. Extensions
  6. Distributed R
  7. Streaming
  8. Logs
5. Disconnecting
6. Using RStudio
7. Resources
8. Recap
3. Analysis
1. Overview
2. Import
3. Wrangle
  1. Built-in Functions
  2. Correlations
4. Visualize
  1. Using ggplot2
  2. Using dbplot
5. Model
  1. Caching
6. Communicate
7. Recap
4. Modeling
5. Pipelines
6. Clusters
1. Overview
2. On-Premises
  1. Managers
  2. Distributions
3. Cloud
  1. Amazon
  2. Databricks
  3. Google
  4. IBM
  5. Microsoft
  6. Qubole
4. Kubernetes
5. Tools
  1. RStudio
  2. Jupyter
  3. Livy
6. Recap
7. Connections
1. Overview
  1. Edge Nodes
  2. Spark Home
2. Local
3. Standalone
4. YARN
  1. YARN Client
  2. YARN Cluster
5. Livy
6. Mesos
7. Kubernetes
8. Cloud
9. Batches
10. Tools
11. Multiple Connections
12. Troubleshooting
13. Recap
8. Data
1. Overview
2. Reading Data
  1. Paths
  2. Schema
  3. Memory
  4. Columns
3. Writing Data
4. Copying Data
5. File Formats
  1. CSV
  2. JSON
  3. Parquet
  4. Others
6. File Systems
7. Storage Systems
  1. Hive
  2. Cassandra
  3. JDBC
8. Recap
9. Tuning
10. Extensions
1. Overview
2. H2O
3. Graphs
4. XGBoost
5. Deep Learning
6. Genomics
7. Spatial
8. Troubleshooting
9. Recap
11. Distributed R
12. Streaming
1. Overview
2. Transformations
3. Kafka
4. Shiny
5. Recap
13. Contributing
A. Supplemental Code References
Index

Product information

Title: Mastering Spark with R
Author(s): Javier Luraschi, Kevin Kuo, Edgar Ruiz
Release date: October 2019
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781492046370

You might also like

book

Text Mining with R

by Julia Silge, David Robinson

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to …

book

Advanced Machine Learning with R

by Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Master an array of machine learning techniques with real-world projects that interface TensorFlow with R, H2O, …

book

Regression Analysis with R

by Giuseppe Ciaburro

Build effective regression models in R to extract valuable insights from real data About This Book …

book

Statistical Analysis with R For Dummies

by Joseph Schmuller

Understanding the world of R programming and analysis has never been easier Most guides to R, …

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now