Book description
Get savvy with R language and actualize projects aimed at analysis, visualization and machine learning
About This Book
Proficiently analyze data and apply machine learning techniques
Generate visualizations, develop interactive visualizations and applications to understand various data exploratory functions in R
Construct a predictive model by using a variety of machine learning packages
Who This Book Is For
This Learning Path is ideal for those who have been exposed to R, but have not used it extensively yet. It covers the basics of using R and is written for new and intermediate R users interested in learning. This Learning Path also provides in-depth insights into professional techniques for analysis, visualization, and machine learning with R – it will help you increase your R expertise, regardless of your level of experience.
What You Will Learn
Get data into your R environment and prepare it for analysis
Perform exploratory data analyses and generate meaningful visualizations of the data
Generate various plots in R using the basic R plotting techniques
Create presentations and learn the basics of creating apps in R for your audience
Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm
Visualize associations in various graph formats and find frequent itemset using the ECLAT algorithm
Build, tune, and evaluate predictive models with different machine learning packages
Incorporate R and Hadoop to solve machine learning problems on big data
In Detail
The R language is a powerful, open source, functional programming language. At its core, R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics. This Learning Path is chock-full of recipes. Literally! It aims to excite you with awesome projects focused on analysis, visualization, and machine learning. We’ll start off with data analysis – this will show you ways to use R to generate professional analysis reports. We’ll then move on to visualizing our data – this provides you with all the guidance needed to get comfortable with data visualization with R. Finally, we’ll move into the world of machine learning – this introduces you to data classification, regression, clustering, association rule mining, and dimension reduction.
This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:
R Data Analysis Cookbook by Viswa Viswanathan and Shanthi Viswanathan
R Data Visualization Cookbook by Atmajitsinh Gohil
Machine Learning with R Cookbook by Yu-Wei, Chiu (David Chiu)
Style and approach
This course creates a smooth learning path that will teach you how to analyze data and create stunning visualizations. The step-by-step instructions provided for each recipe in this comprehensive Learning Path will show you how to create machine learning projects with R.
Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.
Table of contents
-
R: Recipes for Analysis, Visualization and Machine Learning
- Table of Contents
- R: Recipes for Analysis, Visualization and Machine Learning
- R: Recipes for Analysis, Visualization and Machine Learning
- Credits
- Preface
-
1. Module 1
-
1. A Simple Guide to R
- Installing packages and getting help in R
- Data types in R
- Special values in R
- Matrices in R
- Editing a matrix in R
- Data frames in R
- Editing a data frame in R
- Importing data in R
- Exporting data in R
- Writing a function in R
- Writing if else statements in R
- Basic loops in R
- Nested loops in R
- The apply, lapply, sapply, and tapply functions
- Using par to beautify a plot in R
- Saving plots
- 2. Practical Machine Learning with R
-
3. Acquire and Prepare the Ingredients – Your Data
- Introduction
- Reading data from CSV files
- Reading XML data
- Reading JSON data
- Reading data from fixed-width formatted files
- Reading data from R files and R libraries
- Removing cases with missing values
- Replacing missing values with the mean
- Removing duplicate cases
- Rescaling a variable to [0,1]
- Normalizing or standardizing data in a data frame
- Binning numerical data
- Creating dummies for categorical variables
-
4. What's in There? – Exploratory Data Analysis
- Introduction
- Creating standard data summaries
- Extracting a subset of a dataset
- Splitting a dataset
- Creating random data partitions
- Generating standard plots such as histograms, boxplots, and scatterplots
- Generating multiple plots on a grid
- Selecting a graphics device
- Creating plots with the lattice package
- Creating plots with the ggplot2 package
- Creating charts that facilitate comparisons
- Creating charts that help visualize a possible causality
- Creating multivariate plots
-
5. Where Does It Belong? – Classification
- Introduction
- Generating error/classification-confusion matrices
- Generating ROC charts
- Building, plotting, and evaluating – classification trees
- Using random forest models for classification
- Classifying using Support Vector Machine
- Classifying using the Naïve Bayes approach
- Classifying using the KNN approach
- Using neural networks for classification
- Classifying using linear discriminant function analysis
- Classifying using logistic regression
- Using AdaBoost to combine classification tree models
-
6. Give Me a Number – Regression
- Introduction
- Computing the root mean squared error
- Building KNN models for regression
- Performing linear regression
- Performing variable selection in linear regression
- Building regression trees
- Building random forest models for regression
- Using neural networks for regression
- Performing k-fold cross-validation
- Performing leave-one-out-cross-validation to limit overfitting
- 7. Can You Simplify That? – Data Reduction Techniques
- 8. Lessons from History – Time Series Analysis
-
9. It's All About Your Connections – Social Network Analysis
- Introduction
- Downloading social network data using public APIs
- Creating adjacency matrices and edge lists
-
Plotting social network data
- Getting ready
- How to do it...
- How it works...
-
There's more...
- Specifying plotting preferences
- Plotting directed graphs
- Creating a graph object with weights
- Extracting the network as an adjacency matrix from the graph object
- Extracting an adjacency matrix with weights
- Extracting edge list from graph object
- Creating bipartite network graph
- Generating projections of a bipartite network
- See also...
- Computing important network metrics
- 10. Put Your Best Foot Forward – Document and Present Your Analysis
-
11. Work Smarter, Not Harder – Efficient and Elegant R Code
- Introduction
- Exploiting vectorized operations
- Processing entire rows or columns using the apply function
- Applying a function to all elements of a collection with lapply and sapply
- Applying functions to subsets of a vector
- Using the split-apply-combine strategy with plyr
- Slicing, dicing, and combining data with data tables
-
12. Where in the World? – Geospatial Analysis
- Introduction
- Downloading and plotting a Google map of an area
- Overlaying data on the downloaded Google map
- Importing ESRI shape files into R
- Using the sp package to plot geographic data
- Getting maps from the maps package
- Creating spatial data frames from regular data frames containing spatial and other data
- Creating spatial data frames by combining regular data frames with spatial objects
- Adding variables to an existing spatial data frame
- 13. Playing Nice – Connecting to Other Systems
-
1. A Simple Guide to R
-
2. Module 2
-
1. Basic and Interactive Plots
- Introduction
- Introducing a scatter plot
- Scatter plots with texts, labels, and lines
- Connecting points in a scatter plot
- Generating an interactive scatter plot
- A simple bar plot
- An interactive bar plot
- A simple line plot
- Line plot to tell an effective story
- Generating an interactive Gantt/timeline chart in R
- Merging histograms
- Making an interactive bubble plot
- Constructing a waterfall plot in R
- 2. Heat Maps and Dendrograms
- 3. Maps
- 4. The Pie Chart and Its Alternatives
- 5. Adding the Third Dimension
- 6. Data in Higher Dimensions
-
7. Visualizing Continuous Data
- Introduction
- Generating a candlestick plot
- Generating interactive candlestick plots
- Generating a decomposed time series
- Plotting a regression line
- Constructing a box and whiskers plot
- Generating a violin plot
- Generating a quantile-quantile plot (QQ plot)
- Generating a density plot
- Generating a simple correlation plot
- 8. Visualizing Text and XKCD-style Plots
- 9. Creating Applications in R
-
1. Basic and Interactive Plots
-
3. Module 3
-
1. Data Exploration with RMS Titanic
- Introduction
- Reading a Titanic dataset from a CSV file
- Converting types on character variables
- Detecting missing values
- Imputing missing values
- Exploring and visualizing data
- Predicting passenger survival with a decision tree
- Validating the power of prediction with a confusion matrix
- Assessing performance with the ROC curve
-
2. R and Statistics
- Introduction
- Understanding data sampling in R
- Operating a probability distribution in R
- Working with univariate descriptive statistics in R
- Performing correlations and multivariate analysis
- Operating linear regression and multivariate analysis
- Conducting an exact binomial test
- Performing student's t-test
- Performing the Kolmogorov-Smirnov test
- Understanding the Wilcoxon Rank Sum and Signed Rank test
- Working with Pearson's Chi-squared test
- Conducting a one-way ANOVA
- Performing a two-way ANOVA
-
3. Understanding Regression Analysis
- Introduction
- Fitting a linear regression model with lm
- Summarizing linear model fits
- Using linear regression to predict unknown values
- Generating a diagnostic plot of a fitted model
- Fitting a polynomial regression model with lm
- Fitting a robust linear regression model with rlm
- Studying a case of linear regression on SLID data
- Applying the Gaussian model for generalized linear regression
- Applying the Poisson model for generalized linear regression
- Applying the Binomial model for generalized linear regression
- Fitting a generalized additive model to data
- Visualizing a generalized additive model
- Diagnosing a generalized additive model
-
4. Classification (I) – Tree, Lazy, and Probabilistic
- Introduction
- Preparing the training and testing datasets
- Building a classification model with recursive partitioning trees
- Visualizing a recursive partitioning tree
- Measuring the prediction performance of a recursive partitioning tree
- Pruning a recursive partitioning tree
- Building a classification model with a conditional inference tree
- Visualizing a conditional inference tree
- Measuring the prediction performance of a conditional inference tree
- Classifying data with the k-nearest neighbor classifier
- Classifying data with logistic regression
- Classifying data with the Naïve Bayes classifier
-
5. Classification (II) – Neural Network and SVM
- Introduction
- Classifying data with a support vector machine
- Choosing the cost of a support vector machine
- Visualizing an SVM fit
- Predicting labels based on a model trained by a support vector machine
- Tuning a support vector machine
- Training a neural network with neuralnet
- Visualizing a neural network trained by neuralnet
- Predicting labels based on a model trained by neuralnet
- Training a neural network with nnet
- Predicting labels based on a model trained by nnet
-
6. Model Evaluation
- Introduction
- Estimating model performance with k-fold cross-validation
- Performing cross-validation with the e1071 package
- Performing cross-validation with the caret package
- Ranking the variable importance with the caret package
- Ranking the variable importance with the rminer package
- Finding highly correlated features with the caret package
- Selecting features using the caret package
- Measuring the performance of the regression model
- Measuring prediction performance with a confusion matrix
- Measuring prediction performance using ROCR
- Comparing an ROC curve using the caret package
- Measuring performance differences between models with the caret package
-
7. Ensemble Learning
- Introduction
- Classifying data with the bagging method
- Performing cross-validation with the bagging method
- Classifying data with the boosting method
- Performing cross-validation with the boosting method
- Classifying data with gradient boosting
- Calculating the margins of a classifier
- Calculating the error evolution of the ensemble method
- Classifying data with random forest
- Estimating the prediction errors of different classifiers
-
8. Clustering
- Introduction
- Clustering data with hierarchical clustering
- Cutting trees into clusters
- Clustering data with the k-means method
- Drawing a bivariate cluster plot
- Comparing clustering methods
- Extracting silhouette information from clustering
- Obtaining the optimum number of clusters for k-means
- Clustering data with the density-based method
- Clustering data with the model-based method
- Visualizing a dissimilarity matrix
- Validating clusters externally
-
9. Association Analysis and Sequence Mining
- Introduction
- Transforming data into transactions
- Displaying transactions and associations
- Mining associations with the Apriori rule
- Pruning redundant rules
- Visualizing association rules
- Mining frequent itemsets with Eclat
- Creating transactions with temporal information
- Mining frequent sequential patterns with cSPADE
-
10. Dimension Reduction
- Introduction
- Performing feature selection with FSelector
- Performing dimension reduction with PCA
- Determining the number of principal components using the scree test
- Determining the number of principal components using the Kaiser method
- Visualizing multivariate data using biplot
- Performing dimension reduction with MDS
- Reducing dimensions with SVD
- Compressing images with SVD
- Performing nonlinear dimension reduction with ISOMAP
- Performing nonlinear dimension reduction with Local Linear Embedding
-
11. Big Data Analysis (R and Hadoop)
- Introduction
- Preparing the RHadoop environment
- Installing rmr2
- Installing rhdfs
- Operating HDFS with rhdfs
- Implementing a word count problem with RHadoop
- Comparing the performance between an R MapReduce program and a standard R program
- Testing and debugging the rmr2 program
- Installing plyrmr
- Manipulating data with plyrmr
- Conducting machine learning with RHadoop
- Configuring RHadoop clusters on Amazon EMR
- A. Resources for R and Machine Learning
- B. Dataset – Survival of Passengers on the Titanic
-
1. Data Exploration with RMS Titanic
- A. Bibliography
- Index
Product information
- Title: R: Recipes for Analysis, Visualization and Machine Learning
- Author(s):
- Release date: November 2016
- Publisher(s): Packt Publishing
- ISBN: 9781787289598
You might also like
book
R: Data Analysis and Visualization
Master the art of building analytical models using R About This Book Load, wrangle, and analyze …
book
R Data Visualization Recipes
Translate your data into info-graphics using popular packages in R About This Book Use R's popular …
book
Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist
Discover best practices for data analysis and software development in R and start on the path …
book
Advanced Machine Learning with R
Master an array of machine learning techniques with real-world projects that interface TensorFlow with R, H2O, …