Mastering Predictive Analytics with R

Book description

Master the craft of predictive modeling by developing strategy, intuition, and a solid foundation in essential concepts

In Detail

R offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions in the real world. With its constantly growing community and plethora of packages, R offers the functionality to deal with a truly vast array of problems.

This book is designed to be both a guide and a reference for moving beyond the basics of predictive modeling. The book begins with a dedicated chapter on the language of models and the predictive modeling process. Each subsequent chapter tackles a particular type of model, such as neural networks, and focuses on the three important questions of how the model works, how to use R to train it, and how to measure and assess its performance using real world data sets.

By the end of this book, you will have explored and tested the most popular modeling techniques in use on real world data sets and mastered a diverse range of techniques in predictive analytics.

What You Will Learn

  • Master the steps involved in the predictive modeling process
  • Learn how to classify predictive models and distinguish which models are suitable for a particular problem
  • Understand how and why each predictive model works
  • Recognize the assumptions, strengths, and weaknesses of a predictive model, and that there is no best model for every problem
  • Select appropriate metrics to assess the performance of different types of predictive model
  • Diagnose performance and accuracy problems when they arise and learn how to deal with them
  • Grow your expertise in using R and its diverse range of packages

Table of contents

  1. Mastering Predictive Analytics with R
    1. Table of Contents
    2. Mastering Predictive Analytics with R
    3. Credits
    4. About the Author
    5. Acknowledgments
    6. About the Reviewers
    7. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    8. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    9. 1. Gearing Up for Predictive Modeling
      1. Models
        1. Learning from data
        2. The core components of a model
        3. Our first model: k-nearest neighbors
      2. Types of models
        1. Supervised, unsupervised, semi-supervised, and reinforcement learning models
        2. Parametric and nonparametric models
        3. Regression and classification models
        4. Real-time and batch machine learning models
      3. The process of predictive modeling
        1. Defining the model's objective
        2. Collecting the data
        3. Picking a model
        4. Preprocessing the data
          1. Exploratory data analysis
          2. Feature transformations
          3. Encoding categorical features
          4. Missing data
          5. Outliers
          6. Removing problematic features
        5. Feature engineering and dimensionality reduction
        6. Training and assessing the model
        7. Repeating with different models and final model selection
        8. Deploying the model
      4. Performance metrics
        1. Assessing regression models
        2. Assessing classification models
          1. Assessing binary classification models
      5. Summary
    10. 2. Linear Regression
      1. Introduction to linear regression
        1. Assumptions of linear regression
      2. Simple linear regression
        1. Estimating the regression coefficients
      3. Multiple linear regression
        1. Predicting CPU performance
        2. Predicting the price of used cars
      4. Assessing linear regression models
        1. Residual analysis
        2. Significance tests for linear regression
        3. Performance metrics for linear regression
        4. Comparing different regression models
        5. Test set performance
      5. Problems with linear regression
        1. Multicollinearity
        2. Outliers
      6. Feature selection
      7. Regularization
        1. Ridge regression
        2. Least absolute shrinkage and selection operator (lasso)
        3. Implementing regularization in R
      8. Summary
    11. 3. Logistic Regression
      1. Classifying with linear regression
      2. Introduction to logistic regression
        1. Generalized linear models
        2. Interpreting coefficients in logistic regression
        3. Assumptions of logistic regression
        4. Maximum likelihood estimation
      3. Predicting heart disease
      4. Assessing logistic regression models
        1. Model deviance
        2. Test set performance
      5. Regularization with the lasso
      6. Classification metrics
      7. Extensions of the binary logistic classifier
        1. Multinomial logistic regression
          1. Predicting glass type
        2. Ordinal logistic regression
          1. Predicting wine quality
      8. Summary
    12. 4. Neural Networks
      1. The biological neuron
      2. The artificial neuron
      3. Stochastic gradient descent
        1. Gradient descent and local minima
        2. The perceptron algorithm
        3. Linear separation
        4. The logistic neuron
      4. Multilayer perceptron networks
        1. Training multilayer perceptron networks
      5. Predicting the energy efficiency of buildings
        1. Evaluating multilayer perceptrons for regression
      6. Predicting glass type revisited
      7. Predicting handwritten digits
        1. Receiver operating characteristic curves
      8. Summary
    13. 5. Support Vector Machines
      1. Maximal margin classification
      2. Support vector classification
        1. Inner products
      3. Kernels and support vector machines
      4. Predicting chemical biodegration
      5. Cross-validation
      6. Predicting credit scores
      7. Multiclass classification with support vector machines
      8. Summary
    14. 6. Tree-based Methods
      1. The intuition for tree models
      2. Algorithms for training decision trees
        1. Classification and regression trees
          1. CART regression trees
          2. Tree pruning
          3. Missing data
        2. Regression model trees
        3. CART classification trees
        4. C5.0
      3. Predicting class membership on synthetic 2D data
      4. Predicting the authenticity of banknotes
      5. Predicting complex skill learning
        1. Tuning model parameters in CART trees
        2. Variable importance in tree models
        3. Regression model trees in action
      6. Summary
    15. 7. Ensemble Methods
      1. Bagging
        1. Margins and out-of-bag observations
        2. Predicting complex skill learning with bagging
        3. Predicting heart disease with bagging
        4. Limitations of bagging
      2. Boosting
        1. AdaBoost
      3. Predicting atmospheric gamma ray radiation
      4. Predicting complex skill learning with boosting
        1. Limitations of boosting
      5. Random forests
        1. The importance of variables in random forests
      6. Summary
    16. 8. Probabilistic Graphical Models
      1. A little graph theory
      2. Bayes' Theorem
      3. Conditional independence
      4. Bayesian networks
      5. The Naïve Bayes classifier
        1. Predicting the sentiment of movie reviews
      6. Hidden Markov models
      7. Predicting promoter gene sequences
      8. Predicting letter patterns in English words
      9. Summary
    17. 9. Time Series Analysis
      1. Fundamental concepts of time series
        1. Time series summary functions
      2. Some fundamental time series
        1. White noise
          1. Fitting a white noise time series
        2. Random walk
          1. Fitting a random walk
      3. Stationarity
      4. Stationary time series models
        1. Moving average models
        2. Autoregressive models
        3. Autoregressive moving average models
      5. Non-stationary time series models
        1. Autoregressive integrated moving average models
        2. Autoregressive conditional heteroscedasticity models
        3. Generalized autoregressive heteroscedasticity models
      6. Predicting intense earthquakes
      7. Predicting lynx trappings
      8. Predicting foreign exchange rates
      9. Other time series models
      10. Summary
    18. 10. Topic Modeling
      1. An overview of topic modeling
      2. Latent Dirichlet Allocation
        1. The Dirichlet distribution
        2. The generative process
        3. Fitting an LDA model
      3. Modeling the topics of online news stories
        1. Model stability
        2. Finding the number of topics
        3. Topic distributions
        4. Word distributions
        5. LDA extensions
      4. Summary
    19. 11. Recommendation Systems
      1. Rating matrix
        1. Measuring user similarity
      2. Collaborative filtering
        1. User-based collaborative filtering
        2. Item-based collaborative filtering
      3. Singular value decomposition
      4. R and Big Data
      5. Predicting recommendations for movies and jokes
      6. Loading and preprocessing the data
      7. Exploring the data
        1. Evaluating binary top-N recommendations
        2. Evaluating non-binary top-N recommendations
        3. Evaluating individual predictions
      8. Other approaches to recommendation systems
      9. Summary
    20. Index

Product information

  • Title: Mastering Predictive Analytics with R
  • Author(s): Rui Miguel Forte
  • Release date: June 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781783982806