R in a Nutshell, 2nd Edition

Book description

If you’re considering R for statistical computing and data visualization, this book provides a quick and practical guide to just about everything you can do with the open source R language and software environment. You’ll learn how to write R functions and use R packages to help you prepare, visualize, and analyze data. Author Joseph Adler illustrates each process with a wealth of examples from medicine, business, and sports.

Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.

  • Get started quickly with an R tutorial and hundreds of examples
  • Explore R syntax, objects, and other language details
  • Find thousands of user-contributed R packages online, including Bioconductor
  • Learn how to use R to prepare data for analysis
  • Visualize your data with R’s graphics, lattice, and ggplot2 packages
  • Use R to calculate statistical fests, fit models, and compute probability distributions
  • Speed up intensive computations by writing parallel R programs for Hadoop
  • Get a complete desktop reference to R

Publisher resources

View/Submit Errata

Table of contents

  1. R in a Nutshell
  2. Preface
    1. Why I Wrote This Book
    2. When Should You Use R?
    3. What’s New in the Second Edition?
    4. R License Terms
    5. Examples
    6. How This Book Is Organized
    7. Conventions Used in This Book
    8. Using Code Examples
    9. Safari® Books Online
    10. How to Contact Us
    11. Acknowledgments
  3. I. R Basics
    1. 1. Getting and Installing R
      1. R Versions
      2. Getting and Installing Interactive R Binaries
        1. Windows
        2. Mac OS X
        3. Linux and Unix Systems
          1. Installation using package management systems
          2. Installing R from downloaded files
    2. 2. The R User Interface
      1. The R Graphical User Interface
        1. Windows
        2. Mac OS X
        3. Linux and Unix
      2. The R Console
        1. Command-Line Editing
      3. Batch Mode
      4. Using R Inside Microsoft Excel
      5. RStudio
      6. Other Ways to Run R
    3. 3. A Short R Tutorial
      1. Basic Operations in R
      2. Functions
      3. Variables
      4. Introduction to Data Structures
      5. Objects and Classes
      6. Models and Formulas
      7. Charts and Graphics
      8. Getting Help
    4. 4. R Packages
      1. An Overview of Packages
      2. Listing Packages in Local Libraries
      3. Loading Packages
        1. Loading Packages on Windows and Linux
        2. Loading Packages on Mac OS X
      4. Exploring Package Repositories
        1. Exploring R Package Repositories on the Web
        2. Finding and Installing Packages Inside R
          1. Windows and Linux GUIs
          2. Mac OS X GUI
          3. R console
          4. Installing from the command line
      5. Installing Packages From Other Repositories
      6. Custom Packages
        1. Creating a Package Directory
        2. Building the Package
  4. II. The R Language
    1. 5. An Overview of the R Language
      1. Expressions
      2. Objects
      3. Symbols
      4. Functions
      5. Objects Are Copied in Assignment Statements
      6. Everything in R Is an Object
      7. Special Values
        1. NA
        2. Inf and -Inf
        3. NaN
        4. NULL
      8. Coercion
      9. The R Interpreter
      10. Seeing How R Works
    2. 6. R Syntax
      1. Constants
        1. Numeric Vectors
        2. Character Vectors
        3. Symbols
      2. Operators
        1. Order of Operations
        2. Assignments
      3. Expressions
        1. Separating Expressions
        2. Parentheses
        3. Curly Braces
      4. Control Structures
        1. Conditional Statements
        2. Loops
      5. Accessing Data Structures
        1. Data Structure Operators
        2. Indexing by Integer Vector
        3. Indexing by Logical Vector
        4. Indexing by Name
      6. R Code Style Standards
    3. 7. R Objects
      1. Primitive Object Types
      2. Vectors
      3. Lists
      4. Other Objects
        1. Matrices
        2. Arrays
        3. Factors
        4. Data Frames
        5. Formulas
        6. Time Series
        7. Shingles
        8. Dates and Times
        9. Connections
      5. Attributes
        1. Class
    4. 8. Symbols and Environments
      1. Symbols
      2. Working with Environments
      3. The Global Environment
      4. Environments and Functions
        1. Working with the Call Stack
        2. Evaluating Functions in Different Environments
        3. Adding Objects to an Environment
      5. Exceptions
        1. Signaling Errors
        2. Catching Errors
    5. 9. Functions
      1. The Function Keyword
      2. Arguments
      3. Return Values
      4. Functions as Arguments
        1. Anonymous Functions
        2. Properties of Functions
      5. Argument Order and Named Arguments
      6. Side Effects
        1. Changes to Other Environments
        2. Input/Output
        3. Graphics
    6. 10. Object-Oriented Programming
      1. Overview of Object-Oriented Programming in R
        1. Key Ideas
        2. Implementation Example
      2. Object-Oriented Programming in R: S4 Classes
        1. Defining Classes
        2. New Objects
        3. Accessing Slots
        4. Working with Objects
        5. Creating Coercion Methods
        6. Methods
        7. Managing Methods
        8. Basic Classes
        9. More Help
      3. Old-School OOP in R: S3
        1. S3 Classes
        2. S3 Methods
        3. Using S3 Classes in S4 Classes
        4. Finding Hidden S3 Methods
  5. III. Working with Data
    1. 11. Saving, Loading, and Editing Data
      1. Entering Data Within R
        1. Entering Data Using R Commands
        2. Using the Edit GUI
          1. Windows Data Editor
          2. Mac OS X Data Editor
          3. X Windows (Linux) Data Editor
      2. Saving and Loading R Objects
        1. Saving Objects with save
      3. Importing Data from External Files
        1. Text Files
          1. Delimited files
          2. Fixed-width files
          3. Other functions to parse data
        2. Other Software
      4. Exporting Data
      5. Importing Data From Databases
        1. Export Then Import
        2. Database Connection Packages
        3. RODBC
          1. Getting RODBC working
            1. Installing the RODBC package
            2. Installing ODBC drivers
            3. Example: SQLite ODBC on Mac OS X
            4. Example: SQLite ODBC on Windows
          2. Using RODBC
            1. Opening a channel
            2. Getting information about the database
            3. Getting data
            4. Closing a channel
        4. DBI
          1. Opening a connection
          2. Getting DB information
          3. Querying the database
          4. Cleaning up
        5. TSDBI
      6. Getting Data from Hadoop
    2. 12. Preparing Data
      1. Combining Data Sets
        1. Pasting Together Data Structures
          1. Paste
          2. rbind and cbind
          3. An extended example
        2. Merging Data by Common Fields
      2. Transformations
        1. Reassigning Variables
        2. The Transform Function
        3. Applying a Function to Each Element of an Object
          1. Applying a function to an array
          2. Applying a function to a list or vector
          3. the plyr library
      3. Binning Data
        1. Shingles
        2. Cut
        3. Combining Objects with a Grouping Variable
      4. Subsets
        1. Bracket Notation
        2. subset Function
        3. Random Sampling
      5. Summarizing Functions
        1. tapply, aggregate
        2. Aggregating Tables with rowsum
        3. Counting Values
        4. Reshaping Data
          1. Transposing matrices and data frames
          2. Reshaping data frames and matrices
          3. Using the Reshape Library
            1. Melting and Casting
            2. Examples of reshape
            3. melt
            4. Cast
      6. Data Cleaning
      7. Finding and Removing Duplicates
      8. Sorting
  6. IV. Data Visualization
    1. 13. Graphics
      1. An Overview of R Graphics
        1. Scatter Plots
        2. Plotting Time Series
        3. Bar Charts
        4. Pie Charts
        5. Plotting Categorical Data
        6. Three-Dimensional Data
        7. Plotting Distributions
        8. Box Plots
      2. Graphics Devices
      3. Customizing Charts
        1. Common Arguments to Chart Functions
        2. Graphical Parameters
          1. Annotation
          2. Margins
          3. Multiple plots
          4. Text properties
            1. Text size
            2. Typeface
            3. Alignment and spacing
            4. Rotation
          5. Line properties
          6. Colors
          7. Axes
          8. Points
          9. Graphical parameters by name
        3. Basic Graphics Functions
          1. points
          2. lines
          3. curve
          4. text
          5. abline
          6. polygon
          7. segments
          8. legend
          9. title
          10. axis
          11. box
          12. mtext
          13. trans3d
    2. 14. Lattice Graphics
      1. History
      2. An Overview of the Lattice Package
        1. How Lattice Works
        2. A Simple Example
        3. Using Lattice Functions
        4. Custom Panel Functions
      3. High-Level Lattice Plotting Functions
        1. Univariate Trellis Plots
          1. Bar charts
          2. Dot plots
          3. Histograms
          4. Density plots
          5. Strip plots
          6. Univariate quantile-quantile plots
        2. Bivariate Trellis Plots
          1. Scatter plots
          2. Box plots in lattice
          3. Scatter plots matrices
          4. Bivariate quantile-quantile plots
        3. Trivariate Plots
          1. Level plots
          2. Contour plots
          3. Cloud plots
          4. Wire-frame plots
        4. Other Plots
      4. Customizing Lattice Graphics
        1. Common Arguments to Lattice Functions
        2. trellis.skeleton
        3. Controlling How Axes Are Drawn
        4. Parameters
        5. plot.trellis
        6. strip.default
        7. simpleKey
      5. Low-Level Functions
        1. Low-Level Graphics Functions
        2. Panel Functions
    3. 15. ggplot2
      1. A Short Introduction
      2. The Grammar of Graphics
      3. A More Complex Example: Medicare Data
      4. Quick Plot
      5. Creating Graphics with ggplot2
      6. Learning More
  7. V. Statistics with R
    1. 16. Analyzing Data
      1. Summary Statistics
      2. Correlation and Covariance
      3. Principal Components Analysis
      4. Factor Analysis
      5. Bootstrap Resampling
    2. 17. Probability Distributions
      1. Normal Distribution
      2. Common Distribution-Type Arguments
      3. Distribution Function Families
    3. 18. Statistical Tests
      1. Continuous Data
        1. Normal Distribution-Based Tests
          1. Comparing means
          2. Comparing paired data
          3. Comparing variances of two populations
          4. Comparing means across more than two groups
          5. Pairwise t-tests between multiple groups
          6. Testing for normality
          7. Testing if a data vector came from an arbitrary distribution
          8. Testing if two data vectors came from the same distribution
          9. Correlation tests
        2. Non-Parametric Tests
          1. Comparing two means
          2. Comparing more than two means
          3. Comparing variances
          4. Difference in scale parameters
      2. Discrete Data
        1. Proportion Tests
        2. Binomial Tests
        3. Tabular Data Tests
        4. Non-Parametric Tabular Data Tests
    4. 19. Power Tests
      1. Experimental Design Example
      2. t-Test Design
      3. Proportion Test Design
      4. ANOVA Test Design
    5. 20. Regression Models
      1. Example: A Simple Linear Model
        1. Fitting a Model
        2. Helper Functions for Specifying the Model
        3. Getting Information About a Model
          1. Viewing the model
          2. Predicting values using a model
          3. Analyzing the fit
        4. Refining the Model
      2. Details About the lm Function
        1. Assumptions of Least Squares Regression
        2. Robust and Resistant Regression
          1. Resistant regression
          2. Robust regression
          3. Comparing lm, lqs, and rlm
      3. Subset Selection and Shrinkage Methods
        1. Stepwise Variable Selection
        2. Ridge Regression
        3. Lasso and Least Angle Regression
        4. elasticnet
        5. Principal Components Regression and Partial Least Squares Regression
      4. Nonlinear Models
        1. Generalized Linear Models
        2. glmnet
        3. Nonlinear Least Squares
      5. Survival Models
      6. Smoothing
        1. Splines
        2. Fitting Polynomial Surfaces
        3. Kernel Smoothing
      7. Machine Learning Algorithms for Regression
        1. Regression Tree Models
          1. Recursive partitioning trees
          2. Patient rule induction method
          3. Bagging for regression
          4. Boosting for regression
          5. Random forests for regression
        2. MARS
        3. Neural Networks
        4. Project Pursuit Regression
        5. Generalized Additive Models
        6. Support Vector Machines
    6. 21. Classification Models
      1. Linear Classification Models
        1. Logistic Regression
        2. Linear Discriminant Analysis
        3. Log-Linear Models
      2. Machine Learning Algorithms for Classification
        1. k Nearest Neighbors
        2. Classification Tree Models
          1. Bagging
          2. Boosting
        3. Neural Networks
        4. SVMs
        5. Random Forests
    7. 22. Machine Learning
      1. Market Basket Analysis
      2. Clustering
        1. Distance Measures
        2. Clustering Algorithms
    8. 23. Time Series Analysis
      1. Autocorrelation Functions
      2. Time Series Models
  8. VI. Additional Topics
    1. 24. Optimizing R Programs
      1. Measuring R Program Performance
        1. Timing
        2. Profiling
        3. Monitor How Much Memory You Are Using
        4. Profiling Memory Usage
      2. Optimizing Your R Code
        1. Using Vector Operations
          1. Iterative algorithms and vector operations
          2. Transforming problems to use built-in functions
        2. Lookup Performance in R
          1. Lookups and R objects
          2. Using environment objects in place of vectors
        3. Use a Database to Query Large Data Sets
        4. Preallocate Memory
        5. Cleaning Up Memory
        6. Functions for Big Data Sets
      3. Other Ways to Speed Up R
        1. The R Byte Code Compiler
          1. Manual compilation
          2. Inspecting byte code
          3. Just-in-time compilation
        2. High-Performance R Binaries
          1. Revolution R
          2. Building your own
            1. Building on Microsoft Windows
            2. Building R on Unix-like systems
            3. Building R on Mac OS X
    2. 25. Bioconductor
      1. An Example
        1. Loading Raw Expression Data
        2. Loading Data from GEO
        3. Matching Phenotype Data
        4. Analyzing Expression Data
      2. Key Bioconductor Packages
      3. Data Structures
        1. eSet
        2. AssayData
        3. AnnotatedDataFrame
        4. MIAME
        5. Other Classes Used by Bioconductor Packages
      4. Where to Go Next
        1. Resources Outside Bioconductor
        2. Vignettes
        3. Courses
        4. Books
    3. 26. R and Hadoop
      1. R and Hadoop
        1. Overview of Hadoop
          1. Map/Reduce
          2. Distributed data storage
          3. Managing a cluster of servers
          4. Java framework
          5. When should you consider Hadoop?
        2. RHadoop
          1. Make sure Hadoop is installed locally
          2. Installing RHadoop locally
          3. An example RHadoop application
          4. Details of rmr
          5. Learning more
        3. Hadoop Streaming
        4. Learning More
      2. Other Packages for Parallel Computation with R
        1. Segue
        2. doMC
      3. Where to Learn More
  9. A. R Reference
    1. base
      1. Functions
      2. Data Sets
    2. boot
      1. Functions
      2. Data Sets
    3. class
      1. Functions
    4. cluster
      1. Functions
      2. Data Sets
    5. codetools
    6. foreign
      1. Functions
    7. grDevices
      1. Functions
      2. Data Sets
    8. graphics
      1. Functions
    9. grid
    10. KernSmooth
      1. Functions
    11. lattice
      1. Functions
      2. Data Sets
    12. MASS
      1. Functions
      2. Data Sets
    13. methods
      1. Functions
    14. mgcv
    15. nlme
    16. nnet
      1. Functions
    17. rpart
      1. Functions
      2. Data Sets
    18. spatial
      1. Functions
    19. splines
      1. Functions
    20. stats
      1. Functions
      2. Data Set
    21. stats4
      1. Functions
    22. survival
      1. Functions
      2. Data Sets
    23. tcltk
    24. tools
      1. Functions
      2. Data Sets
    25. utils
      1. Functions
  10. Bibliography
  11. Index
  12. About the Author
  13. Colophon
  14. Copyright

Product information

  • Title: R in a Nutshell, 2nd Edition
  • Author(s): Joseph Adler
  • Release date: October 2012
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781449312084