Predictive Modeling with SAS Enterprise Miner, 3rd Edition

Book description

A step-by-step guide to predictive modeling! Kattamuri Sarma's Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition, will show you how to develop and test predictive models quickly using SAS Enterprise Miner. Using realistic data, the book explains complex methods in a simple and practical way to readers from different backgrounds and industries. Incorporating the latest version of Enterprise Miner, this third edition also expands the section on time series. Written for business analysts, data scientists, statisticians, students, predictive modelers, and data miners, this comprehensive text provides examples that will strengthen your understanding of the essential concepts and methods of predictive modeling. Topics covered include logistic regression, regression, decision trees, neural networks, variable clustering, observation clustering, data imputation, binning, data exploration, variable selection, variable transformation, and much more, including analysis of textual data. Develop predictive models quickly, learn how to test numerous models and compare the results, gain an in-depth understanding of predictive models and multivariate methods, and discover how to do in-depth analysis. Do it all with Predictive Modeling with SAS Enterprise Miner!

Table of contents

  1. About This Book
  2. About The Author
  3. Chapter 1: Research Strategy
  4. 1.1 Introduction
  5. 1.2 Types of Inputs
    1. 1.2.1 Measurement Scales for Variables
    2. 1.2.2 Predictive Models with Textual Data
  6. 1.3 Defining the Target
    1. 1.3.1 Predicting Response to Direct Mail
    2. 1.3.2 Predicting Risk in the Auto Insurance Industry
    3. 1.3.3 Predicting Rate Sensitivity of Bank Deposit Products
    4. 1.3.4 Predicting Customer Attrition
    5. 1.3.5 Predicting a Nominal Categorical (Unordered Polychotomous) Target
  7. 1.4 Sources of Modeling Data
    1. 1.4.1 Comparability between the Sample and Target Universe
    2. 1.4.2 Observation Weights
  8. 1.5 Pre-Processing the Data
    1. 1.5.1 Data Cleaning Before Launching SAS Enterprise Miner
    2. 1.5.2 Data Cleaning After Launching SAS Enterprise Miner
  9. 1.6 Alternative Modeling Strategies
    1. 1.6.1 Regression with a Moderate Number of Input Variables
    2. 1.6.2 Regression with a Large Number of Input Variables
  10. 1.7 Notes
  11. Chapter 2: Getting Started with Predictive Modeling
  12. 2.1 Introduction
  13. 2.2 Opening SAS Enterprise Miner 14.1
  14. 2.3 Creating a New Project in SAS Enterprise Miner 14.1
  15. 2.4 The SAS Enterprise Miner Window
  16. 2.5 Creating a SAS Data Source
  17. 2.6 Creating a Process Flow Diagram
  18. 2.7 Sample Nodes
    1. 2.7.1 Input Data Node
    2. 2.7.2 Data Partition Node
    3. 2.7.3 Filter Node
    4. 2.7.4 File Import Node
    5. 2.7.5 Time Series Nodes
    6. 2.7.6 Merge Node
    7. 2.7.7 Append Node
  19. 2.8 Tools for Initial Data Exploration
    1. 2.8.1 Stat Explore Node
    2. 2.8.2 MultiPlot Node
    3. 2.8.3 Graph Explore Node
    4. 2.8.4 Variable Clustering Node
    5. 2.8.5 Cluster Node
    6. 2.8.6 Variable Selection Node
  20. 2.9 Tools for Data Modification
    1. 2.9.1 Drop Node
    2. 2.9.2 Replacement Node
    3. 2.9.3 Impute Node
    4. 2.9.4 Interactive Binning Node
    5. 2.9.5 Principal Components Node
    6. 2.9.6 Transform Variables Node
  21. 2.10 Utility Nodes
    1. 2.10.1 SAS Code Node
  22. 2.11 Appendix to Chapter 2
    1. 2.11.1 The Type, the Measurement Scale, and the Number of Levels of a Variable
    2. 2.11.2 Eigenvalues, Eigenvectors, and Principal Components
    3. 2.11.3 Cramer’s V
    4. 2.11.4 Calculation of Chi-Square Statistic and Cramer’s V for a Continuous Input
  23. 2.12 Exercises
  24. Notes
  25. Chapter 3: Variable Selection and Transformation of Variables
  26. 3.1 Introduction
  27. 3.2 Variable Selection
    1. 3.2.1 Continuous Target with Numeric Interval-scaled Inputs (Case 1)
    2. 3.2.2 Continuous Target with Nominal-Categorical Inputs (Case 2)
    3. 3.2.3 Binary Target with Numeric Interval-scaled Inputs (Case 3)
    4. 3.2.4 Binary Target with Nominal-scaled Categorical Inputs (Case 4)
  28. 3.3 Variable Selection Using the Variable Clustering Node
    1. 3.3.1 Selection of the Best Variable from Each Cluster
    2. 3.3.2 Selecting the Cluster Components
  29. 3.4 Variable Selection Using the Decision Tree Node
  30. 3.5 Transformation of Variables
    1. 3.5.1 Transform Variables Node
    2. 3.5.2 Transformation before Variable Selection
    3. 3.5.3 Transformation after Variable Selection
    4. 3.5.4 Passing More Than One Type of Transformation for Each Interval Input to the Next Node
    5. 3.5.5 Saving and Exporting the Code Generated by the Transform Variables Node
  31. 3.6 Summary
  32. 3.7 Appendix to Chapter 3
    1. 3.7.1 Changing the Measurement Scale of a Variable in a Data Source
    2. 3.7.2 SAS Code for Comparing Grouped Categorical Variables with the Ungrouped Variables
  33. Exercises
  34. Note
  35. Chapter 4: Building Decision Tree Models to Predict Response and Risk
  36. 4.1 Introduction
  37. 4.2 An Overview of the Tree Methodology in SAS® Enterprise Miner™
    1. 4.2.1 Decision Trees
    2. 4.2.2 Decision Tree Models
    3. 4.2.3 Decision Tree Models vs. Logistic Regression Models
    4. 4.2.4 Applying the Decision Tree Model to Prospect Data
    5. 4.2.5 Calculation of the Worth of a Tree
    6. 4.2.6 Roles of the Training and Validation Data in the Development of a Decision Tree
    7. 4.2.7 Regression Tree
  38. 4.3 Development of the Tree in SAS Enterprise Miner
    1. 4.3.1 Growing an Initial Tree
    2. 4.3.2 P-value Adjustment Options
    3. 4.3.3 Controlling Tree Growth: Stopping Rules
    4. 4.3.3.1 Controlling Tree Growth through the Split Size Property
    5. 4.3.4 Pruning: Selecting the Right-Sized Tree Using Validation Data
    6. 4.3.5 Step-by-Step Illustration of Growing and Pruning a Tree
    7. 4.3.6 Average Profit vs. Total Profit for Comparing Trees of Different Sizes
    8. 4.3.7 Accuracy /Misclassification Criterion in Selecting the Right-sized Tree (Classification of Records and Nodes by Maximizing Accuracy)
    9. 4.3.8 Assessment of a Tree or Sub-tree Using Average Square Error
    10. 4.3.9 Selection of the Right-sized Tree
  39. 4.4 Decision Tree Model to Predict Response to Direct Marketing
    1. 4.4.1 Testing Model Performance with a Test Data Set
    2. 4.4.2 Applying the Decision Tree Model to Score a Data Set
  40. 4.5 Developing a Regression Tree Model to Predict Risk
    1. 4.5.1 Summary of the Regression Tree Model to Predict Risk
  41. 4.6 Developing Decision Trees Interactively
    1. 4.6.1 Interactively Modifying an Existing Decision Tree
    2. 4.6.3 Developing the Maximal Tree in Interactive Mode
  42. 4.7 Summary
  43. 4.8 Appendix to Chapter 4
    1. 4.8.1 Pearson’s Chi-Square Test
    2. 4.8.2 Calculation of Impurity Reduction using Gini Index
    3. 4.8.3 Calculation of Impurity Reduction/Information Gain using Entropy
    4. 4.8.4 Adjusting the Predicted Probabilities for Over-sampling
    5. 4.8.5 Expected Profits Using Unadjusted Probabilities
    6. 4.8.6 Expected Profits Using Adjusted Probabilities
  44. 4.9 Exercises
  45. Notes
  46. Chapter 5: Neural Network Models to Predict Response and Risk
  47. 5.1 Introduction
    1. 5.1.1 Target Variables for the Models
    2. 5.1.2 Neural Network Node Details
  48. 5.2 General Example of a Neural Network Model
    1. 5.2.1 Input Layer
    2. 5.2.2 Hidden Layers
    3. 5.2.3 Output Layer or Target Layer
    4. 5.2.4 Activation Function of the Output Layer
  49. 5.3 Estimation of Weights in a Neural Network Model
  50. 5.4 Neural Network Model to Predict Response
    1. 5.4.1 Setting the Neural Network Node Properties
    2. 5.4.2 Assessing the Predictive Performance of the Estimated Model
    3. 5.4.3 Receiver Operating Characteristic (ROC) Charts
    4. 5.4.4 How Did the Neural Network Node Pick the Optimum Weights for This Model?
    5. 5.4.5 Scoring a Data Set Using the Neural Network Model
    6. 5.4.6 Score Code
  51. 5.5 Neural Network Model to Predict Loss Frequency in Auto Insurance
    1. 5.5.1 Loss Frequency as an Ordinal Target
    2. 5.5.1.1 Target Layer Combination and Activation Functions
    3. 5.5.3 Classification of Risks for Rate Setting in Auto Insurance with Predicted Probabilities
  52. 5.6 Alternative Specifications of the Neural Networks
    1. 5.6.1 A Multilayer Perceptron (MLP) Neural Network
    2. 5.6.2 Radial Basis Function (RBF) Neural Network
  53. 5.7 Comparison of Alternative Built-in Architectures of the Neural Network Node
    1. 5.7.1 Multilayer Perceptron (MLP) Network
    2. 5.7.2 Ordinary Radial Basis Function with Equal Heights and Widths (ORBFEQ)
    3. 5.7.3 Ordinary Radial Basis Function with Equal Heights and Unequal Widths (ORBFUN)
    4. 5.7.4 Normalized Radial Basis Function with Equal Widths and Heights (NRBFEQ)
    5. 5.7.5 Normalized Radial Basis Function with Equal Heights and Unequal Widths (NRBFEH)
    6. 5.7.6 Normalized Radial Basis Function with Equal Widths and Unequal Heights (NRBFEW)
    7. 5.7.7 Normalized Radial Basis Function with Equal Volumes (NRBFEV)
    8. 5.7.8 Normalized Radial Basis Function with Unequal Widths and Heights (NRBFUN
    9. 5.7.9 User-Specified Architectures
  54. 5.8 AutoNeural Node
  55. 5.9 DMNeural Node
  56. 5.10 Dmine Regression Node
  57. 5.11 Comparing the Models Generated by DMNeural, AutoNeural, and Dmine Regression Node
  58. 5.12 Summary
  59. 5.13 Appendix to Chapter 5
  60. 5.14 Exercises
  61. Notes
  62. Chapter 6: Regression Models
  63. 6.1 Introduction
  64. 6.2 What Types of Models Can Be Developed Using the Regression Node?
    1. 6.2.1 Models with a Binary Target
    2. 6.2.2 Models with an Ordinal Target
    3. 6.2.3 Models with a Nominal (Unordered) Target
    4. 6.2.4 Models with Continuous Targets
  65. 6.3 An Overview of Some Properties of the Regression Node
    1. 6.3.1 Regression Type Property
    2. 6.3.2 Link Function Property
    3. 6.3.3 Selection Model Property
    4. 6.3.4 Selection Criterion Property5
  66. 6.4 Business Applications
    1. 6.4.1 Logistic Regression for Predicting Response to a Mail Campaign
    2. 6.4.2 Regression for a Continuous Target
  67. 6.5 Summary
  68. 6.6 Appendix to Chapter 6
    1. 6.6.1 SAS Code
    2. 6.6.2 Examples of the selection criteria when the Model Selection property set to Forward
  69. 6.7 Exercises
  70. Notes
  71. Chapter 7: Comparison and Combination of Different Models
  72. 7.1 Introduction
  73. 7.2 Models for Binary Targets: An Example of Predicting Attrition
    1. 7.2.1 Logistic Regression for Predicting Attrition
    2. 7.2.2 Decision Tree Model for Predicting Attrition
    3. 7.2.3 A Neural Network Model for Predicting Attrition
  74. 7.3 Models for Ordinal Targets: An Example of Predicting the Risk of Accident Risk
    1. 7.3.1 Lift Charts and Capture Rates for Models with Ordinal Targets
    2. 7.3.2 Logistic Regression with Proportional Odds for Predicting Risk in Auto Insurance
    3. 7.3.3 Decision Tree Model for Predicting Risk in Auto Insurance
    4. 7.3.4 Neural Network Model for Predicting Risk in Auto Insurance
  75. 7.4 Comparison of All Three Accident Risk Models
  76. 7.5 Boosting and Combining Predictive Models
    1. 7.5.1 Gradient Boosting
    2. 7.5.2 Stochastic Gradient Boosting
    3. 7.5.3 An Illustration of Boosting Using the Gradient Boosting Node
    4. 7.5.4 The Ensemble Node
    5. 7.5.5 Comparing the Gradient Boosting and Ensemble Methods of Combining Models
  77. 7.6 Appendix to Chapter 7
    1. 7.6.1 Least Squares Loss
    2. 7.6.2 Least Absolute Deviation Loss
    3. 7.6.3 Huber-M Loss
    4. 7.6.4 Logit Loss
  78. 7.7 Exercises
  79. Note
  80. Chapter 8: Customer Profitability
  81. 8.1 Introduction
  82. 8.2 Acquisition Cost
  83. 8.3 Cost of Default
  84. 8.5 Profit
  85. 8.6 The Optimum Cutoff Point
  86. 8.7 Alternative Scenarios of Response and Risk
  87. 8.8 Customer Lifetime Value
  88. 8.9 Suggestions for Extending Results
  89. Note
  90. Chapter 9: Introduction to Predictive Modeling with Textual Data
  91. 9.1 Introduction
    1. 9.1.1 Quantifying Textual Data: A Simplified Example
    2. 9.1.2 Dimension Reduction and Latent Semantic Indexing
    3. 9.1.3 Summary of the Steps in Quantifying Textual Information
  92. 9.2 Retrieving Documents from the World Wide Web
    1. 9.2.1 The %TMFILTER Macro
  93. 9.3 Creating a SAS Data Set from Text Files
  94. 9.4 The Text Import Node
  95. 9.5 Creating a Data Source for Text Mining
  96. 9.6 Text Parsing Node
  97. 9.7 Text Filter Node
    1. 9.7.1 Frequency Weighting
    2. 9.7.2 Term Weighting
    3. 9.7.3 Adjusted Frequencies
    4. 9.7.4 Frequency Weighting Methods
    5. 9.7.5 Term Weighting Methods
  98. 9.8 Text Topic Node
    1. 9.8.1 Developing a Predictive Equation Using the Output Data Set Created by the Text Topic Node
  99. 9.9 Text Cluster Node
    1. 9.9.1 Hierarchical Clustering
    2. 9.9.2 Expectation-Maximization (EM) Clustering
    3. 9.9.3 Using the Text Cluster Node
  100. 9.10 Exercises
  101. Notes
  102. Index

Product information

  • Title: Predictive Modeling with SAS Enterprise Miner, 3rd Edition
  • Author(s): Kattamuri S. Sarma
  • Release date: July 2017
  • Publisher(s): SAS Institute
  • ISBN: 9781635260380