By Jared Lander
Publisher: Pearson
Final Release Date: May 2016
Run time: 15 hours 10 minutes
15+ Hours of Video Instruction
R Programming Data Analyst Learning Path, is a tour through the most important parts of R, the statistical programming language, from the very basics to complex modeling. It covers reading data, programming basics, visualization, data munging, regression, classification, clustering, modern machine learning, network analysis, web graphics, and techniques for dealing with large data, both in memory and in databases.
Description
This 15hour video teaches you how to program in R even if you are unfamiliar with statistical techniques. It starts with the basics of using R and progresses into data manipulation and model building. Users learn through handson practice with the code and techniques. New material covers chaining commands, faster data manipulation, new ways to read rectangular data into R, testing code, and the hot package Shiny.
Based on a course on R and Big Data taught by the author at Columbia
 Designed from the ground up to help viewers quickly overcome R’s learning curve
 Packed with handson practice opportunities and realistic, downloadable code examples
 Presented by an author with unsurpassed experience teaching statistical programming and modeling to novices
 For every potential R user: programmers, data scientists, DBAs, marketers, quants, scientists, policymakers, and many others
About the Instructor
Jared P. Lander is the Chief Data Scientist of Lander Analytics, the organizer of the New York Open Statistical Programming Meetup (formerly the R Meetup) and an adjunct professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. He specializes in data management, multilevel models, machine learning, generalized linear models, data management, visualization, and statistical computing. He is the author of R for Everyone, a book about R Programming geared toward data scientists and nonstatisticians alike. Very active in the data community, Jared is a frequent speaker at conferences, universities, and meetups around the world. He is a member of the Strata New York selection committee.
Skill Level
 Beginner
 Intermediate
 Advanced
What You Will Learn
 Installing R
 Basic math
 Working with variables and different data types
 Matrix algebra
 data.frames
 Reading data
 Data aggregation and manipulation
 plyr
 dplyr
 Making statistical graphs
 Manipulate text
 Automatically generate reports and slideshows
 Display data with popular JavaScript libraries
 Build Shiny dashboards
 Build R packages
 Incorporate C++ for faster code
 Basic statistics
 Linear models
 Generalized linear models
 Model validation
 Decision trees
 Random forests
 Bootstrap
 Time series analysis
 Clustering
 Network analysis
 Automatic parameter tuning
 Bayesian regression using Stan
Who Should Take This Course
Part 1 of the lessons is geared toward people who are new to either R or programming in general.
Part 2 is for R programmers who already have an intermediate level of knowledge such as that gained from Reading R for Everyone or from viewing Part 1.
Course Requirements
Table of Contents
Part 1: R as a Tool
Lesson 1. Getting Started with R
R can only be used after installation, which fortunately is just as simple as installing any other program. In this lesson, you learn about where to download R, how to decide on the best version, how to install it, and you get familiar with its environment, using RStudio as a front end. We also take a look at the package system.
Lesson 2. The Basic Building Blocks in R
R is a flexible and robust programming language, and using it requires understanding how it handles data. We learn about performing basic math in R, storing various types of data in variables such as numeric, integer, character, and timebased and calling functions on the data.
Lesson 3. Advanced Data Structures in R
Like many other languages, R offers more complex storage mechanisms such as vectors, arrays, matrices, and lists. We take a look at those and the data.frame, a special storage type that strongly resembles a spreadsheet and is part of what makes working with data in R such a pleasure.
Lesson 4. Reading Data into R
Data is abundant in the world, so analyzing it is just a matter of getting the data into R. There are many ways of doing so, the most common being reading from a CSV file or database. We cover these techniques, and also importing from other statistical tools, scraping websites, and reading Excel files.
Lesson 5. Making Statistical Graphs
Visualizing data is a crucial part of data science both in the discovery phase and when reporting results. R has long been known for its capability to produce compelling plots, and Hadley Wickham’s ggplot2 package makes it even easier to produce better looking graphics. We cover histograms scatterplots, boxplots, line charts, and more, in both base graphics and ggplot2 and then explore newer packages ggvis and rCharts.
Lesson 6. Basics of Programming
R has all the standard components of a programming language such as writing functions, if statements and loops, all with their own caveats and quirks. We start with the requisite “Hello, World!” function and learn about arguments to functions, the regular if statement and the vectorized version, and how to build loops and why they should be avoided.
Lesson 7. Data Munging
Data scientists often bemoan that 80% of their work is manipulating data. As such, R has many tools for this, which are, contrary to what Python users may say, easy to use. We see how R excels at group operations using apply, lapply, and the plyr package. We also take a look at its facilities for joining, combining, and rearranging data. Then we speed that up with tidyr, data.table, and dplyr.
Lesson 8. InDepth with dplyr
dplyr has become such an indispensible tool, nearly superseding plyr, that it is worth devoting extra attention to. So we examine its select, filter, mutate, group_by and summarize functions, among others.
Lesson 9. Manipulating Strings
Text data is becoming more pervasive in the world, and fortunately, R provides ways for both combining text and ripping it apart, which we walk through. We also examine R’s extensive regular expression capabilities.
Lesson 10. Reports and Slideshows with knitr
Successfully delivering the results of an analysis can be just as important as the analysis itself, so it is important to communicate them in an effective way. In this lesson, we learn how to use knitr and rmarkdown to write both static and interactive results in the form of pdf documents, websites, HTML5 slideshows, and even Word documents.
Lesson 11. Include HTML Widgets in HTML Documents
Recent years have seen the advance of JavaScriptpowered displays of information, and the htmlwidgets package enables R to take advantage of arbitrary JavaScript libraries. In particular, we look at datatable for a tabular display of data, bokeh for rich web plots, and leaflet for powerful mapping.
Lesson 12. Shiny
Built by Rstudio, Shiny is a tool for building interactive data displays and dashboards all within R. This allows the R programmer to convey results in a compelling, userrich experience in a language he or she is familiar with.
Lesson 13. Package Building
Building packages is a great way to contribute back to the R community, and doing so has never been easier thanks to Hadley Wickham's devtools package. This lesson covers all the requirements for a package and how to go about authoring, testing, and distributing them.
Lesson 14. Rcpp for Faster Code
Sometimes pure R code is not fast enough, and extra speed is required. Rcpp enables R programmers to seamlessly integrate C++ code into their R code. We go over the basics of getting the two languages working together, write some speedy functions in C++, and even integrate C++ into R packages.
Part 2: R for Statistics, Modeling, and Machine Learning
Lesson 15. Basic Statistics
Naturally, R has all the basics when it comes to statistics such as means, variance, correlation, ttests, and ANOVAs. We look at all the different ways those can be computed.
Lesson 16. Linear Models
The workhorse of statistics is regression and its extensions. This consists of linear models, generalized linear modelsincluding logistic and Poisson regressionand survival models. We look at how to fit these models in R and how to evaluate them using measures such as mean squared error, deviance, and AIC.
Lesson 17. Other Models
Beyond regression there are many other types of models that can be fit to data. Models covered include regularization with the elastic net, Bayesian shrinkage, nonlinear models such as nonlinear least squares, splines and generalized additive models, decision tress, and random forests.
Lesson 18. Time Series
Special care must be taken with data where there is timebased correlation, otherwise known as autocorrelation. We look at some common methods for dealing with time series such as ARIMA, VAR, and GARCH.
Lesson 19. Clustering
A focal point of modern machine learning is clustering, the partitioning of data into groups. We explore three popular methods: Kmeans, Kmedoids, and hierarchical clustering.
Lesson 20. More Machine Learning
Two areas seeing increasing interest are recommendation engines and text mining, which we illustrate with RecommenderLab, RTextTools, and the irlba package for fast matrix factorization.
Lesson 21. Network Analysis
The world is rich with network data that are nicely studied with graphical models. We show you how to analyze and visualize networks using the igraph package.
Lesson 22. Automatic Parameter Tuning with Caret
Machine learning models often have parameters that need tuning, which can significantly affect the quality of the model. The Caret package, by Max Kuhn, makes finding optimal parameter values easy to find.
Lesson 23. Fit a Bayesian Model with RStan
Bayesian data analysis uses simulations to fit both simple and complex models. Andrew Gelman’s new language, Stan, makes this faster and easier than ever before. We explore this by fitting a simple linear regression and varyingintercept multilevel model.
About LiveLessons Video Training
The LiveLessons Video Training series publishes hundreds of handson, expertled video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features worldleading author instructors published by your trusted technology brands: AddisonWesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home and Office Technologies, Business and Management, and more. View all LiveLessons on InformIT at: http://www.informit.com/livelessons.

 Introduction

 Lesson 1: Getting Started with R

Learning objectives 00m 28s 
1.1 Download and Install R 06m 23s 
1.2 Work in the R Environment 18m 50s 
1.3 Install and load packages 04m 49s  Lesson 2: The Basic Building Blocks in R

Learning objectives 00m 26s 
2.1 Use R as a calculator 03m 43s 
2.2 Work with variables 04m 11s 
2.3 Understand the different data types 11m 32s 
2.4 Store data in vectors 16m 36s 
2.5 Call functions 04m 03s  Lesson 3: Advanced Data Structures in R

Learning objectives 00m 25s 
3.1 Create and access information in data.frames 17m 20s 
3.2 Create and access information in lists 10m 57s 
3.3 Create and access information in matrices 08m 01s  Lesson 4: Reading Data into R

Learning objectives 00m 26s 
4.1 Read a CSV into R 05m 58s 
4.2 Read an Excel Spreadsheet into R 04m 38s 
4.3 Read from databases 05m 59s 
4.4 Read data files from other statistical tools 01m 17s 
4.5 Load binary R files 04m 40s 
4.6 Load data included with R 01m 48s 
4.7 Scrape data from the web 02m 28s 
4.8 Read XML data 27m 23s  Lesson 5: Making Statistical Graphs

Learning objectives 00m 34s 
5.1 Find the diamonds in the data 01m 13s 
5.2 Make histograms with base graphics 01m 29s 
5.3 Make scatterplots with base graphics 02m 01s 
5.4 Make boxplots with base graphics 01m 39s 
5.5 Get familiar with ggplot2 02m 30s 
5.6 Plot histograms and densities with ggplot2 03m 51s 
5.7 Make scatterplots with ggplot2 05m 12s 
5.8 Make boxplots and violin plots with ggplot2 04m 24s 
5.9 Make line plots 08m 21s 
5.10 Create small multiples 04m 01s 
5.11 Control colors and shapes 01m 18s 
5.12 Add themes to graphs 02m 18s 
5.13 Use Web graphics 29m 48s  Lesson 6: Basics of Programming

Learning objectives 00m 26s 
6.1 Write the classic "Hello, World!" example 02m 04s 
6.2 Understand the basics of function arguments 10m 32s 
6.3 Return a value from a function 02m 47s 
6.4 Gain flexibility with do.call 03m 46s 
6.5 Use "if" statements to control program flow 02m 08s 
6.6 Stagger "if" statements with "else" 05m 33s 
6.7 Check multiple statements with switch 03m 51s 
6.8 Run checks on entire vectors 05m 17s 
6.9 Check compound statements 05m 40s 
6.10 Iterate with a for loop 06m 07s 
6.11 Iterate with a while loop 01m 30s 
6.12 Control loops with break and next 02m 05s  Lesson 7: Data Munging

Learning objectives 00m 35s 
7.1 Repeat an operation on a matrix using apply 04m 45s 
7.2 Repeat an operation on a list 03m 05s 
7.3 Apply a function over multiple lists with mapply 04m 34s 
7.4 Perform group summaries with the aggregate function 05m 26s 
7.5 Do group operations with the plyr Package 17m 18s 
7.6 Combine datasets 03m 51s 
7.7 Join datasets 05m 56s 
7.8 Switch storage paradigms 05m 11s 
7.9 Use tidyr 02m 50s 
7.10 Get faster group operations 22m 02s  Lesson 8: InDepth with dplyr

Learning objectives 00m 22s 
8.1 Use tbl 01m 48s 
8.2 Use select to choose columns 03m 08s 
8.3 Use filter to choose rows 03m 38s 
8.4 Use slice to choose rows 01m 08s 
8.5 Use mutate to change or create columns 02m 39s 
8.6 Use summarize for quick computation on tbl 01m 34s 
8.7 Use group_by to split the data 02m 35s 
8.8 Apply arbitrary functions with do 06m 50s  Lesson 9: Manipulating Strings

Learning objectives 00m 20s 
9.1 Combine strings together 07m 28s 
9.2 Extract text 32m 00s  Lesson 10: Reports and Slideshows with knitr

Learning objectives 00m 29s 
10.1 Understand the basics of LaTeX 07m 16s 
10.2 Weave R code into LaTeX using knitr 05m 33s 
10.3 Understand the basics of Markdown 02m 45s 
10.4 Understand the basics of RMarkdown 04m 55s 
10.5 Weave R code into Markdown using knitr 02m 53s 
10.6 Convert Markdown files to Word 01m 29s 
10.7 Convert Markdown to PDF 01m 25s 
10.8 Create slideshows with RMarkdown 03m 09s 
10.9 Write equations with RMarkdown 07m 13s  Lesson 11: Include HTML Widgets in HTML Documents

Learning objectives 00m 28s 
11.1 Work with datatables of tabular data 06m 09s 
11.2 Use rbokeh 08m 31s 
11.3 Use Leaflet for mapping 07m 11s  Lesson 12: Shiny

Learning objectives 00m 22s 
12.1 Use shiny objects in a markdown document 13m 56s 
12.2 Work with ui.r and server.r files 08m 21s  Lesson 13: Package Building

Learning objectives 00m 23s 
13.1 Understand the folder structure and files in a package 05m 25s 
13.2 Write and document functions 07m 32s 
13.3 Check and build a package 02m 09s 
13.4 Test R code 06m 58s 
13.5 Submit a package to CRAN 00m 46s  Lesson 14: Rcpp for Faster Code

Learning objectives 00m 29s 
14.1 Understand the basics of C++ with R 01m 47s 
14.2 Write a C++ function for R 04m 35s 
14.3 Use Rcpp syntactic sugar 05m 51s 
14.4 Sum in C++ 05m 35s 
14.5 Write a package in R 09m 37s 
14.6 Write a package with C++ code 06m 04s  Summary

Part 1: R as a ToolSummary 01m 02s  Introduction

Part 2: R for Statistics, Modeling and Machine LearningIntroduction 02m 13s  Lesson 15: Basic Statistics

Learning objectives 00m 20s 
15.1 Draw numbers from probability distributions 21m 10s 
15.2 Calculate averages, standard deviations and correlations 16m 13s 
15.3 Compare samples with ttests and analysis of variance 18m 58s  Lesson 16: Linear Models

Learning objectives 00m 28s 
16.1 Fit simple linear models 10m 15s 
16.2 Explore the data 08m 33s 
16.3 Fit multiple regression models 19m 16s 
16.4 Fit logistic regression 10m 06s 
16.5 Fit Poisson regression 07m 05s 
16.6 Analyze survival data 12m 01s 
16.7 Assess model quality with residuals 05m 15s 
16.8 Compare models 07m 18s 
16.9 Judge accuracy using crossvalidation 09m 06s 
16.10 Estimate uncertainty with the bootstrap 06m 23s 
16.11 Choose variables using stepwise selection 02m 42s  Lesson 17: Other Models

Learning objectives 00m 27s 
17.1 Select variables and improve predictions with the elastic net 14m 14s 
17.2 Decrease uncertainty with weakly informative priors 08m 53s 
17.3 Fit nonlinear least squares 05m 16s 
17.4 Use Splines 06m 48s 
17.5 Use GAMs 05m 24s 
17.6 Fit decision trees to make a random forest 06m 34s  Lesson 18: Time Series

Learning objectives 00m 20s 
18.1 Understand ACF and PACF 07m 15s 
18.2 Fit and assess ARIMA models 05m 13s 
18.3 Use VAR for multivariate time series 08m 06s 
18.4 Use GARCH for better volatility modeling 09m 24s  Lesson 19: Clustering

Learning objectives 00m 20s 
19.1 Partition data with kmeans 12m 26s 
19.2 Robustly cluster, even with categorical data, with PAM 02m 13s 
19.3 Perform hierarchical clustering 05m 38s  Lesson 20: More Machine Learning

Learning objectives 00m 21s 
20.1 Build a recommendation engine with RecommenderLab 13m 13s 
20.2 Mine text with RTextTools 09m 13s 
20.3 Perform matrix factorization using irlba 04m 04s  Lesson 21: Network Analysis

Learning objectives 00m 18s 
21.1 Get started with igraph 08m 16s 
21.2 Read edgelists 07m 11s 
21.3 Understand common graph metrics 10m 12s 
21.4 Use centrality measures 05m 59s 
21.5 Utilize more graph operations 04m 15s  Lesson 22: Automatic Parameter Tuning with Caret

Learning objectives 00m 19s 
22.1 Establish optimal tree depth for rpart 06m 18s 
22.2 Choose the best number of trees for a random forest 03m 35s  Lesson 23: Fit a Bayesian Model with RStan

Learning objectives 00m 25s 
23.1 Understand the Stan computing paradigm 01m 33s 
23.2 Fit a simple regression model 06m 53s 
23.3 Fit a multilevel model with Stan 06m 42s  Summary

Part 2: R for Statistics, Modeling and Machine LearningSummary 00m 49s 
 Title:
 R Programming
 By:
 Jared Lander
 Publisher:
 Pearson
 Formats:

 Video:
 May 2016
 Run time:
 15 hours 10 minutes

Table of Contents

Product Details





Recommended for You




Customer Reviews


