The Book of R

Book description

The Book of R is a comprehensive, beginner-friendly guide to R, the world's most popular programming language for statistical analysis. Even if you have no programming experience and little more than a grounding in the basics of mathematics, you'll find everything you need to begin using R effectively for statistical analysis.

You'll start with the basics, like how to handle data and write simple programs, before moving on to more advanced topics, like producing statistical summaries of your data and performing statistical tests and modeling. You'll even learn how to create impressive data visualizations with R's basic graphics tools and contributed packages, like ggplot2 and ggvis, as well as interactive 3D visualizations using the rgl package.

Dozens of hands-on exercises (with downloadable solutions) take you from theory to practice, as you learn:

  • The fundamentals of programming in R, including how to write data frames, create functions, and use variables, statements, and loops
  • Statistical concepts like exploratory data analysis, probabilities, hypothesis tests, and regression modeling, and how to execute them in R
  • How to access R's thousands of functions, libraries, and data sets
  • How to draw valid and useful conclusions from your data
  • How to create publication-quality graphics of your results
Combining detailed explanations with real-world examples and exercises, this book will provide you with a solid understanding of both statistics and the depth of R's functionality. Make The Book of R your doorway into the growing world of data analysis.

Publisher resources

View/Submit Errata

Table of contents

  1. Cover
  2. Title
  3. Copyright
  4. Brief Contents
  5. Contents in Detail
  6. Preface
  7. Acknowledgments
  8. Introduction
    1. A Brief History of R
    2. About This Book
      1. Part I: The Language
      2. Part II: Programming
      3. Part III: Statistics and Probability
      4. Part IV: Statistical Testing and Modeling
      5. Part V: Advanced Graphics
    3. For Students
    4. For Instructors
  9. Part I: The Language
  10. Chapter 1: Getting Started
    1. 1.1 Obtaining and Installing R from CRAN
    2. 1.2 Opening R for the First Time
      1. 1.2.1 Console and Editor Panes
      2. 1.2.2 Comments
      3. 1.2.3 Working Directory
      4. 1.2.4 Installing and Loading R Packages
      5. 1.2.5 Help Files and Function Documentation
      6. 1.2.6 Third-Party Editors
    3. 1.3 Saving Work and Exiting R
      1. 1.3.1 Workspaces
      2. 1.3.2 Scripts
    4. 1.4 Conventions
      1. 1.4.1 Coding
      2. 1.4.2 Math and Equation References
      3. 1.4.3 Exercises
      4. Exercise 1.1
  11. Chapter 2: Numerics, Arithmetic, Assignment, and Vectors
    1. 2.1 R for Basic Math
      1. 2.1.1 Arithmetic
      2. 2.1.2 Logarithms and Exponentials
      3. 2.1.3 E-Notation
      4. Exercise 2.1
    2. 2.2 Assigning Objects
      1. Exercise 2.2
    3. 2.3 Vectors
      1. 2.3.1 Creating a Vector
      2. 2.3.2 Sequences, Repetition, Sorting, and Lengths
      3. Exercise 2.3
      4. 2.3.3 Subsetting and Element Extraction
      5. Exercise 2.4
      6. 2.3.4 Vector-Oriented Behavior
      7. Exercise 2.5
  12. Chapter 3: Matrices and Arrays
    1. 3.1 Defining a Matrix
      1. 3.1.1 Filling Direction
      2. 3.1.2 Row and Column Bindings
      3. 3.1.3 Matrix Dimensions
    2. 3.2 Subsetting
      1. 3.2.1 Row, Column, and Diagonal Extractions
      2. 3.2.2 Omitting and Overwriting
      3. Exercise 3.1
    3. 3.3 Matrix Operations and Algebra
      1. 3.3.1 Matrix Transpose
      2. 3.3.2 Identity Matrix
      3. 3.3.3 Scalar Multiple of a Matrix
      4. 3.3.4 Matrix Addition and Subtraction
      5. 3.3.5 Matrix Multiplication
      6. 3.3.6 Matrix Inversion
      7. Exercise 3.2
    4. 3.4 Multidimensional Arrays
      1. 3.4.1 Definition
      2. 3.4.2 Subsets, Extractions, and Replacements
      3. Exercise 3.3
  13. Chapter 4: Non-numeric Values
    1. 4.1 Logical Values
      1. 4.1.1 TRUE or FALSE?
      2. 4.1.2 A Logical Outcome: Relational Operators
      3. Exercise 4.1
      4. 4.1.3 Multiple Comparisons: Logical Operators
      5. Exercise 4.2
      6. 4.1.4 Logicals Are Numbers!
      7. 4.1.5 Logical Subsetting and Extraction
      8. Exercise 4.3
    2. 4.2 Characters
      1. 4.2.1 Creating a String
      2. 4.2.2 Concatenation
      3. 4.2.3 Escape Sequences
      4. 4.2.4 Substrings and Matching
      5. Exercise 4.4
    3. 4.3 Factors
      1. 4.3.1 Identifying Categories
      2. 4.3.2 Defining and Ordering Levels
      3. 4.3.3 Combining and Cutting
      4. Exercise 4.5
  14. Chapter 5: Lists and Data Frames
    1. 5.1 Lists of Objects
      1. 5.1.1 Definition and Component Access
      2. 5.1.2 Naming
      3. 5.1.3 Nesting
      4. Exercise 5.1
    2. 5.2 Data Frames
      1. 5.2.1 Construction
      2. 5.2.2 Adding Data Columns and Combining Data Frames
      3. 5.2.3 Logical Record Subsets
      4. Exercise 5.2
  15. Chapter 6: Special Values, Classes, and Coercion
    1. 6.1 Some Special Values
      1. 6.1.1 Infinity
      2. 6.1.2 NaN
      3. Exercise 6.1
      4. 6.1.3 NA
      5. 6.1.4 NULL
      6. Exercise 6.2
    2. 6.2 Understanding Types, Classes, and Coercion
      1. 6.2.1 Attributes
      2. 6.2.2 Object Class
      3. 6.2.3 Is-Dot Object-Checking Functions
      4. 6.2.4 As-Dot Coercion Functions
      5. Exercise 6.3
  16. Chapter 7: Basic Plotting
    1. 7.1 Using plot with Coordinate Vectors
    2. 7.2 Graphical Parameters
      1. 7.2.1 Automatic Plot Types
      2. 7.2.2 Title and Axis Labels
      3. 7.2.3 Color
      4. 7.2.4 Line and Point Appearances
      5. 7.2.5 Plotting Region Limits
    3. 7.3 Adding Points, Lines, and Text to an Existing Plot
      1. Exercise 7.1
    4. 7.4 The ggplot2 Package
      1. 7.4.1 A Quick Plot with qplot
      2. 7.4.2 Setting Appearance Constants with Geoms
      3. 7.4.3 Aesthetic Mapping with Geoms
      4. Exercise 7.2
  17. Chapter 8: Reading and Writing Files
    1. 8.1 R-Ready Data Sets
      1. 8.1.1 Built-in Data Sets
      2. 8.1.2 Contributed Data Sets
    2. 8.2 Reading in External Data Files
      1. 8.2.1 The Table Format
      2. 8.2.2 Spreadsheet Workbooks
      3. 8.2.3 Web-Based Files
      4. 8.2.4 Other File Formats
    3. 8.3 Writing Out Data Files and Plots
      1. 8.3.1 Data Sets
      2. 8.3.2 Plots and Graphics Files
    4. 8.4 Ad Hoc Object Read/Write Operations
      1. Exercise 8.1
  18. Part II: Programming
  19. Chapter 9: Calling Functions
    1. 9.1 Scoping
      1. 9.1.1 Environments
      2. 9.1.2 Search Path
      3. 9.1.3 Reserved and Protected Names
      4. Exercise 9.1
    2. 9.2 Argument Matching
      1. 9.2.1 Exact
      2. 9.2.2 Partial
      3. 9.2.3 Positional
      4. 9.2.4 Mixed
      5. 9.2.5 Dot-Dot-Dot: Use of Ellipses
      6. Exercise 9.2
  20. Chapter 10: Conditions and Loops
    1. 10.1 if Statements
      1. 10.1.1 Stand-Alone Statement
      2. 10.1.2 else Statements
      3. 10.1.3 Using ifelse for Element-wise Checks
      4. Exercise 10.1
      5. 10.1.4 Nesting and Stacking Statements
      6. 10.1.5 The switch Function
      7. Exercise 10.2
    2. 10.2 Coding Loops
      1. 10.2.1 for Loops
      2. Exercise 10.3
      3. 10.2.2 while Loops
      4. Exercise 10.4
      5. 10.2.3 Implicit Looping with apply
      6. Exercise 10.5
    3. 10.3 Other Control Flow Mechanisms
      1. 10.3.1 Declaring break or next
      2. 10.3.2 The repeat Statement
      3. Exercise 10.6
  21. Chapter 11: Writing Functions
    1. 11.1 The function Command
      1. 11.1.1 Function Creation
      2. 11.1.2 Using return
      3. Exercise 11.1
    2. 11.2 Arguments
      1. 11.2.1 Lazy Evaluation
      2. 11.2.2 Setting Defaults
      3. 11.2.3 Checking for Missing Arguments
      4. 11.2.4 Dealing with Ellipses
      5. Exercise 11.2
    3. 11.3 Specialized Functions
      1. 11.3.1 Helper Functions
      2. 11.3.2 Disposable Functions
      3. 11.3.3 Recursive Functions
      4. Exercise 11.3
  22. Chapter 12: Exceptions, Timings, and Visibility
    1. 12.1 Exception Handling
      1. 12.1.1 Formal Notifications: Errors and Warnings
      2. 12.1.2 Catching Errors with try Statements
      3. Exercise 12.1
    2. 12.2 Progress and Timing
      1. 12.2.1 Textual Progress Bars: Are We There Yet?
      2. 12.2.2 Measuring Completion Time: How Long Did It Take?
      3. Exercise 12.2
    3. 12.3 Masking
      1. 12.3.1 Function and Object Distinction
      2. 12.3.2 Data Frame Variable Distinction
  23. Part III: Statistics and Probability
  24. Chapter 13: Elementary Statistics
    1. 13.1 Describing Raw Data
      1. 13.1.1 Numeric Variables
      2. 13.1.2 Categorical Variables
      3. 13.1.3 Univariate and Multivariate Data
      4. 13.1.4 Parameter or Statistic?
      5. Exercise 13.1
    2. 13.2 Summary Statistics
      1. 13.2.1 Centrality: Mean, Median, Mode
      2. 13.2.2 Counts, Percentages, and Proportions
      3. Exercise 13.2
      4. 13.2.3 Quantiles, Percentiles, and the Five-Number Summary
      5. 13.2.4 Spread: Variance, Standard Deviation, and the Interquartile Range
      6. Exercise 13.3
      7. 13.2.5 Covariance and Correlation
      8. 13.2.6 Outliers
      9. Exercise 13.4
  25. Chapter 14: Basic Data Visualization
    1. 14.1 Barplots and Pie Charts
      1. 14.1.1 Building a Barplot
      2. 14.1.2 A Quick Pie Chart
    2. 14.2 Histograms
    3. 14.3 Box-and-Whisker Plots
      1. 14.3.1 Stand-Alone Boxplots
      2. 14.3.2 Side-by-Side Boxplots
    4. 14.4 Scatterplots
      1. 14.4.1 Single Plot
      2. 14.4.2 Matrix of Plots
      3. Exercise 14.1
  26. Chapter 15: Probability
    1. 15.1 What Is a Probability?
      1. 15.1.1 Events and Probability
      2. 15.1.2 Conditional Probability
      3. 15.1.3 Intersection
      4. 15.1.4 Union
      5. 15.1.5 Complement
      6. Exercise 15.1
    2. 15.2 Random Variables and Probability Distributions
      1. 15.2.1 Realizations
      2. 15.2.2 Discrete Random Variables
      3. 15.2.3 Continuous Random Variables
      4. 15.2.4 Shape, Skew, and Modality
      5. Exercise 15.2
  27. Chapter 16: Common Probability Distributions
    1. 16.1 Common Probability Mass Functions
      1. 16.1.1 Bernoulli Distribution
      2. 16.1.2 Binomial Distribution
      3. Exercise 16.1
      4. 16.1.3 Poisson Distribution
      5. Exercise 16.2
      6. 16.1.4 Other Mass Functions
    2. 16.2 Common Probability Density Functions
      1. 16.2.1 Uniform
      2. Exercise 16.3
      3. 16.2.2 Normal
      4. Exercise 16.4
      5. 16.2.3 Student’s t-distribution
      6. 16.2.4 Exponential
      7. Exercise 16.5
      8. 16.2.5 Other Density Functions
  28. Part IV: Statistical Testing and Modeling
  29. Chapter 17: Sampling Distributions and Confidence
    1. 17.1 Sampling Distributions
      1. 17.1.1 Distribution for a Sample Mean
      2. 17.1.2 Distribution for a Sample Proportion
      3. Exercise 17.1
      4. 17.1.3 Sampling Distributions for Other Statistics
    2. 17.2 Confidence Intervals
      1. 17.2.1 An Interval for a Mean
      2. 17.2.2 An Interval for a Proportion
      3. 17.2.3 Other Intervals
      4. 17.2.4 Comments on Interpretation of a CI
      5. Exercise 17.2
  30. Chapter 18: Hypothesis Testing
    1. 18.1 Components of a Hypothesis Test
      1. 18.1.1 Hypotheses
      2. 18.1.2 Test Statistic
      3. 18.1.3 p-value
      4. 18.1.4 Significance Level
      5. 18.1.5 Criticisms of Hypothesis Testing
    2. 18.2 Testing Means
      1. 18.2.1 Single Mean
      2. Exercise 18.1
      3. 18.2.2 Two Means
      4. Exercise 18.2
    3. 18.3 Testing Proportions
      1. 18.3.1 Single Proportion
      2. 18.3.2 Two Proportions
      3. Exercise 18.3
    4. 18.4 Testing Categorical Variables
      1. 18.4.1 Single Categorical Variable
      2. 18.4.2 Two Categorical Variables
      3. Exercise 18.4
    5. 18.5 Errors and Power
      1. 18.5.1 Hypothesis Test Errors
      2. 18.5.2 Type I Errors
      3. 18.5.3 Type II Errors
      4. Exercise 18.5
      5. 18.5.4 Statistical Power
      6. Exercise 18.6
  31. Chapter 19: Analysis of Variance
    1. 19.1 One-Way ANOVA
      1. 19.1.1 Hypotheses and Diagnostic Checking
      2. 19.1.2 One-Way ANOVA Table Construction
      3. 19.1.3 Building ANOVA Tables with the aov Function
      4. Exercise 19.1
    2. 19.2 Two-Way ANOVA
      1. 19.2.1 A Suite of Hypotheses
      2. 19.2.2 Main Effects and Interactions
    3. 19.3 Kruskal-Wallis Test
      1. Exercise 19.2
  32. Chapter 20: Simple Linear Regression
    1. 20.1 An Example of a Linear Relationship
    2. 20.2 General Concepts
      1. 20.2.1 Definition of the Model
      2. 20.2.2 Estimating the Intercept and Slope Parameters
      3. 20.2.3 Fitting Linear Models with lm
      4. 20.2.4 Illustrating Residuals
    3. 20.3 Statistical Inference
      1. 20.3.1 Summarizing the Fitted Model
      2. 20.3.2 Regression Coefficient Significance Tests
      3. 20.3.3 Coefficient of Determination
      4. 20.3.4 Other summary Output
    4. 20.4 Prediction
      1. 20.4.1 Confidence Interval or Prediction Interval?
      2. 20.4.2 Interpreting Intervals
      3. 20.4.3 Plotting Intervals
      4. 20.4.4 Interpolation vs. Extrapolation
      5. Exercise 20.1
    5. 20.5 Understanding Categorical Predictors
      1. 20.5.1 Binary Variables: k = 2
      2. 20.5.2 Multilevel Variables: k > 2
      3. 20.5.3 Changing the Reference Level
      4. 20.5.4 Treating Categorical Variables as Numeric
      5. 20.5.5 Equivalence with One-Way ANOVA
      6. Exercise 20.2
  33. Chapter 21: Multiple Linear Regression
    1. 21.1 Terminology
    2. 21.2 Theory
      1. 21.2.1 Extending the Simple Model to a Multiple Model
      2. 21.2.2 Estimating in Matrix Form
      3. 21.2.3 A Basic Example
    3. 21.3 Implementing in R and Interpreting
      1. 21.3.1 Additional Predictors
      2. 21.3.2 Interpreting Marginal Effects
      3. 21.3.3 Visualizing the Multiple Linear Model
      4. 21.3.4 Finding Confidence Intervals
      5. 21.3.5 Omnibus F-Test
      6. 21.3.6 Predicting from a Multiple Linear Model
      7. Exercise 21.1
    4. 21.4 Transforming Numeric Variables
      1. 21.4.1 Polynomial
      2. 21.4.2 Logarithmic
      3. 21.4.3 Other Transformations
      4. Exercise 21.2
    5. 21.5 Interactive Terms
      1. 21.5.1 Concept and Motivation
      2. 21.5.2 One Categorical, One Continuous
      3. 21.5.3 Two Categorical
      4. 21.5.4 Two Continuous
      5. 21.5.5 Higher-Order Interactions
      6. Exercise 21.3
  34. Chapter 22: Linear Model Selection and Diagnostics
    1. 22.1 Goodness-of-Fit vs. Complexity
      1. 22.1.1 Principle of Parsimony
      2. 22.1.2 General Guidelines
    2. 22.2 Model Selection Algorithms
      1. 22.2.1 Nested Comparisons: The Partial F-Test
      2. 22.2.2 Forward Selection
      3. 22.2.3 Backward Selection
      4. 22.2.4 Stepwise AIC Selection
      5. Exercise 22.1
      6. 22.2.5 Other Selection Algorithms
    3. 22.3 Residual Diagnostics
      1. 22.3.1 Inspecting and Interpreting Residuals
      2. 22.3.2 Assessing Normality
      3. 22.3.3 Illustrating Outliers, Leverage, and Influence
      4. 22.3.4 Calculating Leverage
      5. 22.3.5 Cook’s Distance
      6. 22.3.6 Graphically Combining Residuals, Leverage, and Cook’s Distance
      7. Exercise 22.2
    4. 22.4 Collinearity
      1. 22.4.1 Potential Warning Signs
      2. 22.4.2 Correlated Predictors: A Quick Example
  35. Part V: Advanced Graphics
  36. Chapter 23: Advanced Plot Customization
    1. 23.1 Handling the Graphics Device
      1. 23.1.1 Manually Opening a New Device
      2. 23.1.2 Switching Between Devices
      3. 23.1.3 Closing a Device
      4. 23.1.4 Multiple Plots in One Device
    2. 23.2 Plotting Regions and Margins
      1. 23.2.1 Default Spacing
      2. 23.2.2 Custom Spacing
      3. 23.2.3 Clipping
    3. 23.3 Point-and-Click Coordinate Interaction
      1. 23.3.1 Retrieving Coordinates Silently
      2. 23.3.2 Visualizing Selected Coordinates
      3. 23.3.3 Ad Hoc Annotation
      4. Exercise 23.1
    4. 23.4 Customizing Traditional R Plots
      1. 23.4.1 Graphical Parameters for Style and Suppression
      2. 23.4.2 Customizing Boxes
      3. 23.4.3 Customizing Axes
    5. 23.5 Specialized Text and Label Notation
      1. 23.5.1 Font
      2. 23.5.2 Greek Symbols
      3. 23.5.3 Mathematical Expressions
    6. 23.6 A Fully Annotated Scatterplot
      1. Exercise 23.2
  37. Chapter 24: Going Further with the Grammar of Graphics
    1. 24.1 ggplot or qplot?
    2. 24.2 Smoothing and Shading
      1. 24.2.1 Adding LOESS Trends
      2. 24.2.2 Constructing Smooth Density Estimates
    3. 24.3 Multiple Plots and Variable-Mapped Facets
      1. 24.3.1 Independent Plots
      2. 24.3.2 Facets Mapped to a Categorical Variable
      3. Exercise 24.1
    4. 24.4 Interactive Tools in ggvis
      1. Exercise 24.2
  38. Chapter 25: Defining Colors and Plotting in Higher Dimensions
    1. 25.1 Representing and Using Color
      1. 25.1.1 Red-Green-Blue Hexadecimal Color Codes
      2. 25.1.2 Built-in Palettes
      3. 25.1.3 Custom Palettes
      4. 25.1.4 Using Color Palettes to Index a Continuum
      5. 25.1.5 Including a Color Legend
      6. 25.1.6 Opacity
      7. 25.1.7 RGB Alternatives and Further Functionality
      8. Exercise 25.1
    2. 25.2 3D Scatterplots
      1. 25.2.1 Basic Syntax
      2. 25.2.2 Visual Enhancements
      3. Exercise 25.2
    3. 25.3 Preparing a Surface for Plotting
      1. 25.3.1 Constructing an Evaluation Grid
      2. 25.3.2 Constructing the z-Matrix
      3. 25.3.3 Conceptualizing the z-Matrix
    4. 25.4 Contour Plots
      1. 25.4.1 Drawing Contour Lines
      2. 25.4.2 Color-Filled Contours
      3. Exercise 25.3
    5. 25.5 Pixel Images
      1. 25.5.1 One Grid Point = One Pixel
      2. 25.5.2 Surface Truncation and Empty Pixels
      3. Exercise 25.4
    6. 25.6 Perspective Plots
      1. 25.6.1 Basic Plots and Angle Adjustment
      2. 25.6.2 Coloring Facets
      3. 25.6.3 Rotating with Loops
      4. Exercise 25.5
  39. Chapter 26: Interactive 3D Plots
    1. 26.1 Point Clouds
      1. 26.1.1 Basic 3D Cloud
      2. 26.1.2 Visual Enhancements and Legends
      3. 26.1.3 Adding Further 3D Components
      4. Exercise 26.1
    2. 26.2 Bivariate Surfaces
      1. 26.2.1 Basic Perspective Surface
      2. 26.2.2 Additional Components
      3. 26.2.3 Coloring by z Value
      4. 26.2.4 Dealing with the Aspect Ratio
      5. Exercise 26.2
    3. 26.3 Trivariate Surfaces
      1. 26.3.1 Evaluation Coordinates in 3D
      2. 26.3.2 Isosurfaces
      3. 26.3.3 Example: Nonparametric Trivariate Density
    4. 26.4 Handling Parametric Equations
      1. 26.4.1 Simple Loci
      2. 26.4.2 Mathematical Abstractions
      3. Exercise 26.3
  40. Appendix A: Installing R and Contributed Packages
    1. A.1 Downloading and Installing R
    2. A.2 Using Packages
      1. A.2.1 Base Packages
      2. A.2.2 Recommended Packages
      3. A.2.3 Contributed Packages
    3. A.3 Updating R and Installed Packages
    4. A.4 Using Other Mirrors and Repositories
      1. A.4.1 Switching CRAN Mirror
      2. A.4.2 Other Package Repositories
    5. A.5 Citing and Writing Packages
      1. A.5.1 Citing R and Contributed Packages
      2. A.5.2 Writing Your Own Packages
  41. Appendix B: Working with RStudio
    1. B.1 Basic Layout and Usage
      1. B.1.1 Editor Features and Appearance Options
      2. B.1.2 Customizing Panes
    2. B.2 Auxiliary Tools
      1. B.2.1 Projects
      2. B.2.2 Package Installer and Updater
      3. B.2.3 Support for Debugging
      4. B.2.4 Markup, Document, and Graphics Tools
  42. Reference List
  43. Index

Product information

  • Title: The Book of R
  • Author(s): Tilman M. Davies
  • Release date: July 2016
  • Publisher(s): No Starch Press
  • ISBN: 9781593276515