R Cookbook
Proven Recipes for Data Analysis, Statistics, and Graphics
Publisher: O'Reilly Media
Release Date: March 2011
Pages: 438
Read on O'Reilly Online Learning with a 10day trial
Start your free trial now Buy on AmazonWhere’s the cart? Now you can get everything with O'Reilly Online Learning. To purchase books, visit Amazon or your favorite retailer. Questions? See our FAQ or contact customer service:
18008898969 / 7078277019
support@oreilly.com
With more than 200 practical recipes, this book helps you perform data analysis with R quickly and efficiently. The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, taskoriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.
Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an experienced data programmer, it will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process.
 Create vectors, handle variables, and perform other basic functions
 Input and output data
 Tackle data structures such as matrices, lists, factors, and data frames
 Work with probability, probability distributions, and random variables
 Calculate statistics and confidence intervals, and perform statistical tests
 Create a variety of graphic displays
 Build statistical models with linear regressions and analysis of variance (ANOVA)
 Explore advanced statistical techniques, such as finding clusters in your data
"Wonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language—one practical example at a time." —Jeffrey Ryan, software consultant and R package author
Table of Contents

Chapter 1 Getting Started and Getting Help

Introduction

Downloading and Installing R

Starting R

Entering Commands

Exiting from R

Interrupting R

Viewing the Supplied Documentation

Getting Help on a Function

Searching the Supplied Documentation

Getting Help on a Package

Searching the Web for Help

Finding Relevant Functions and Packages

Searching the Mailing Lists

Submitting Questions to the Mailing Lists


Chapter 2 Some Basics

Introduction

Printing Something

Setting Variables

Listing Variables

Deleting Variables

Creating a Vector

Computing Basic Statistics

Creating Sequences

Comparing Vectors

Selecting Vector Elements

Performing Vector Arithmetic

Getting Operator Precedence Right

Defining a Function

Typing Less and Accomplishing More

Avoiding Some Common Mistakes


Chapter 3 Navigating the Software

Introduction

Getting and Setting the Working Directory

Saving Your Workspace

Viewing Your Command History

Saving the Result of the Previous Command

Displaying the Search Path

Accessing the Functions in a Package

Accessing Builtin Datasets

Viewing the List of Installed Packages

Installing Packages from CRAN

Setting a Default CRAN Mirror

Suppressing the Startup Message

Running a Script

Running a Batch Script

Getting and Setting Environment Variables

Locating the R Home Directory

Customizing R


Chapter 4 Input and Output

Introduction

Entering Data from the Keyboard

Printing Fewer Digits (or More Digits)

Redirecting Output to a File

Listing Files

Dealing with “Cannot Open File” in Windows

Reading FixedWidth Records

Reading Tabular Data Files

Reading from CSV Files

Writing to CSV Files

Reading Tabular or CSV Data from the Web

Reading Data from HTML Tables

Reading Files with a Complex Structure

Reading from MySQL Databases

Saving and Transporting Objects


Chapter 5 Data Structures

Introduction

Appending Data to a Vector

Inserting Data into a Vector

Understanding the Recycling Rule

Creating a Factor (Categorical Variable)

Combining Multiple Vectors into One Vector and a Factor

Creating a List

Selecting List Elements by Position

Selecting List Elements by Name

Building a Name/Value Association List

Removing an Element from a List

Flatten a List into a Vector

Removing NULL Elements from a List

Removing List Elements Using a Condition

Initializing a Matrix

Performing Matrix Operations

Giving Descriptive Names to the Rows and Columns of a Matrix

Selecting One Row or Column from a Matrix

Initializing a Data Frame from Column Data

Initializing a Data Frame from Row Data

Appending Rows to a Data Frame

Preallocating a Data Frame

Selecting Data Frame Columns by Position

Selecting Data Frame Columns by Name

Selecting Rows and Columns More Easily

Changing the Names of Data Frame Columns

Editing a Data Frame

Removing NAs from a Data Frame

Excluding Columns by Name

Combining Two Data Frames

Merging Data Frames by Common Column

Accessing Data Frame Contents More Easily

Converting One Atomic Value into Another

Converting One Structured Data Type into Another


Chapter 6 Data Transformations

Introduction

Splitting a Vector into Groups

Applying a Function to Each List Element

Applying a Function to Every Row

Applying a Function to Every Column

Applying a Function to Groups of Data

Applying a Function to Groups of Rows

Applying a Function to Parallel Vectors or Lists


Chapter 7 Strings and Dates

Introduction

Getting the Length of a String

Concatenating Strings

Extracting Substrings

Splitting a String According to a Delimiter

Replacing Substrings

Seeing the Special Characters in a String

Generating All Pairwise Combinations of Strings

Getting the Current Date

Converting a String into a Date

Converting a Date into a String

Converting Year, Month, and Day into a Date

Getting the Julian Date

Extracting the Parts of a Date

Creating a Sequence of Dates


Chapter 8 Probability

Introduction

Counting the Number of Combinations

Generating Combinations

Generating Random Numbers

Generating Reproducible Random Numbers

Generating a Random Sample

Generating Random Sequences

Randomly Permuting a Vector

Calculating Probabilities for Discrete Distributions

Calculating Probabilities for Continuous Distributions

Converting Probabilities to Quantiles

Plotting a Density Function


Chapter 9 General Statistics

Introduction

Summarizing Your Data

Calculating Relative Frequencies

Tabulating Factors and Creating Contingency Tables

Testing Categorical Variables for Independence

Calculating Quantiles (and Quartiles) of a Dataset

Inverting a Quantile

Converting Data to ZScores

Testing the Mean of a Sample (t Test)

Forming a Confidence Interval for a Mean

Forming a Confidence Interval for a Median

Testing a Sample Proportion

Forming a Confidence Interval for a Proportion

Testing for Normality

Testing for Runs

Comparing the Means of Two Samples

Comparing the Locations of Two Samples Nonparametrically

Testing a Correlation for Significance

Testing Groups for Equal Proportions

Performing Pairwise Comparisons Between Group Means

Testing Two Samples for the Same Distribution


Chapter 10 Graphics

Introduction

Creating a Scatter Plot

Adding a Title and Labels

Adding a Grid

Creating a Scatter Plot of Multiple Groups

Adding a Legend

Plotting the Regression Line of a Scatter Plot

Plotting All Variables Against All Other Variables

Creating One Scatter Plot for Each Factor Level

Creating a Bar Chart

Adding Confidence Intervals to a Bar Chart

Coloring a Bar Chart

Plotting a Line from x and y Points

Changing the Type, Width, or Color of a Line

Plotting Multiple Datasets

Adding Vertical or Horizontal Lines

Creating a Box Plot

Creating One Box Plot for Each Factor Level

Creating a Histogram

Adding a Density Estimate to a Histogram

Creating a Discrete Histogram

Creating a Normal QuantileQuantile (QQ) Plot

Creating Other QuantileQuantile Plots

Plotting a Variable in Multiple Colors

Graphing a Function

Pausing Between Plots

Displaying Several Figures on One Page

Opening Additional Graphics Windows

Writing Your Plot to a File

Changing Graphical Parameters


Chapter 11 Linear Regression and ANOVA

Introduction

Performing Simple Linear Regression

Performing Multiple Linear Regression

Getting Regression Statistics

Understanding the Regression Summary

Performing Linear Regression Without an Intercept

Performing Linear Regression with Interaction Terms

Selecting the Best Regression Variables

Regressing on a Subset of Your Data

Using an Expression Inside a Regression Formula

Regressing on a Polynomial

Regressing on Transformed Data

Finding the Best Power Transformation (Box–Cox Procedure)

Forming Confidence Intervals for Regression Coefficients

Plotting Regression Residuals

Diagnosing a Linear Regression

Identifying Influential Observations

Testing Residuals for Autocorrelation (Durbin–Watson Test)

Predicting New Values

Forming Prediction Intervals

Performing OneWay ANOVA

Creating an Interaction Plot

Finding Differences Between Means of Groups

Performing Robust ANOVA (Kruskal–Wallis Test)

Comparing Models by Using ANOVA


Chapter 12 Useful Tricks

Introduction

Peeking at Your Data

Widen Your Output

Printing the Result of an Assignment

Summing Rows and Columns

Printing Data in Columns

Binning Your Data

Finding the Position of a Particular Value

Selecting Every nth Element of a Vector

Finding Pairwise Minimums or Maximums

Generating All Combinations of Several Factors

Flatten a Data Frame

Sorting a Data Frame

Sorting by Two Columns

Stripping Attributes from a Variable

Revealing the Structure of an Object

Timing Your Code

Suppressing Warnings and Error Messages

Taking Function Arguments from a List

Defining Your Own Binary Operators


Chapter 13 Beyond Basic Numerics and Statistics

Introduction

Minimizing or Maximizing a SingleParameter Function

Minimizing or Maximizing a Multiparameter Function

Calculating Eigenvalues and Eigenvectors

Performing Principal Component Analysis

Performing Simple Orthogonal Regression

Finding Clusters in Your Data

Predicting a BinaryValued Variable (Logistic Regression)

Bootstrapping a Statistic

Factor Analysis


Chapter 14 Time Series Analysis

Introduction

Representing Time Series Data

Plotting Time Series Data

Extracting the Oldest or Newest Observations

Subsetting a Time Series

Merging Several Time Series

Filling or Padding a Time Series

Lagging a Time Series

Computing Successive Differences

Performing Calculations on Time Series

Computing a Moving Average

Applying a Function by Calendar Period

Applying a Rolling Function

Plotting the Autocorrelation Function

Testing a Time Series for Autocorrelation

Plotting the Partial Autocorrelation Function

Finding Lagged Correlations Between Two Time Series

Detrending a Time Series

Fitting an ARIMA Model

Removing Insignificant ARIMA Coefficients

Running Diagnostics on an ARIMA Model

Making Forecasts from an ARIMA Model

Testing for Mean Reversion

Smoothing a Time Series


Colophon