Data Science for Business
What You Need to Know about Data Mining and DataAnalytic Thinking
Publisher: O'Reilly Media
Release Date: August 2013
Pages: 414
Read on Safari with a 10day trial
Start your free trial now Buy on AmazonWhere’s the cart? Now you can get everything on Safari. To purchase books, visit Amazon or your favorite retailer. Questions? See our FAQ or contact customer service:
18008898969 / 7078277019
support@oreilly.com
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "dataanalytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many datamining techniques in use today.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of realworld business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think dataanalytically, and fully appreciate how data science methods can support business decisionmaking.
 Understand how data science fits in your organization—and how you can use it for competitive advantage
 Treat data as a business asset that requires careful investment if you’re to gain real value
 Approach business problems dataanalytically, using the datamining process to gather good data in the most appropriate way
 Learn general concepts for actually extracting knowledge from data
 Apply data science principles when interviewing data science job candidates
Table of Contents

Chapter 1 Introduction: DataAnalytic Thinking

The Ubiquity of Data Opportunities

Example: Hurricane Frances

Example: Predicting Customer Churn

Data Science, Engineering, and DataDriven Decision Making

Data Processing and “Big Data”

From Big Data 1.0 to Big Data 2.0

Data and Data Science Capability as a Strategic Asset

DataAnalytic Thinking

This Book

Data Mining and Data Science, Revisited

Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist

Summary


Chapter 2 Business Problems and Data Science Solutions

From Business Problems to Data Mining Tasks

Supervised Versus Unsupervised Methods

Data Mining and Its Results

The Data Mining Process

Implications for Managing the Data Science Team

Other Analytics Techniques and Technologies

Summary


Chapter 3 Introduction to Predictive Modeling: From Correlation to Supervised Segmentation

Models, Induction, and Prediction

Supervised Segmentation

Visualizing Segmentations

Trees as Sets of Rules

Probability Estimation

Example: Addressing the Churn Problem with Tree Induction

Summary


Chapter 4 Fitting a Model to Data

Classification via Mathematical Functions

Regression via Mathematical Functions

Class Probability Estimation and Logistic “Regression”

Example: Logistic Regression versus Tree Induction

Nonlinear Functions, Support Vector Machines, and Neural Networks

Summary


Chapter 5 Overfitting and Its Avoidance

Generalization

Overfitting

Overfitting Examined

Example: Overfitting Linear Functions

* Example: Why Is Overfitting Bad?

From Holdout Evaluation to CrossValidation

The Churn Dataset Revisited

Learning Curves

Overfitting Avoidance and Complexity Control

Summary


Chapter 6 Similarity, Neighbors, and Clusters

Similarity and Distance

NearestNeighbor Reasoning

Some Important Technical Details Relating to Similarities and Neighbors

Clustering

Stepping Back: Solving a Business Problem Versus Data Exploration

Summary


Chapter 7 Decision Analytic Thinking I: What Is a Good Model?

Evaluating Classifiers

Generalizing Beyond Classification

A Key Analytical Framework: Expected Value

Evaluation, Baseline Performance, and Implications for Investments in Data

Summary


Chapter 8 Visualizing Model Performance

Ranking Instead of Classifying

Profit Curves

ROC Graphs and Curves

The Area Under the ROC Curve (AUC)

Cumulative Response and Lift Curves

Example: churnperformance analytics for modeling performance analytics, for modeling churn Performance Analytics for Churn Modeling

Summary


Chapter 9 Evidence and Probabilities

Example: Targeting Online Consumers With Advertisements

Combining Evidence Probabilistically

Applying Bayes’ Rule to Data Science

A Model of Evidence “Lift”

Example: Evidence Lifts from Facebook "Likes"

Summary


Chapter 10 Representing and Mining Text

Why Text Is Important

Why Text Is Difficult

Representation

Example: Jazz Musicians

* The Relationship of IDF to Entropy

Beyond Bag of Words

Example: Mining News Stories to Predict Stock Price Movement

Summary


Chapter 11 Decision Analytic Thinking II: Toward Analytical Engineering

Targeting the Best Prospects for a Charity Mailing

Our Churn Example Revisited with Even More Sophistication


Chapter 12 Other Data Science Tasks and Techniques

Cooccurrences and Associations: Finding Items That Go Together

Profiling: Finding Typical Behavior

Link Prediction and Social Recommendation

Data Reduction, Latent Information, and Movie Recommendation

Bias, Variance, and Ensemble Methods

DataDriven Causal Explanation and a Viral Marketing Example

Summary


Chapter 13 Data Science and Business Strategy

Thinking DataAnalytically, Redux

Achieving Competitive Advantage with Data Science

Sustaining Competitive Advantage with Data Science

Attracting and Nurturing Data Scientists and Their Teams

Examine Data Science Case Studies

Be Ready to Accept Creative Ideas from Any Source

Be Ready to Evaluate Proposals for Data Science Projects

A Firm’s Data Science Maturity


Chapter 14 Conclusion

The Fundamental Concepts of Data Science

What Data Can’t Do: Humans in the Loop, Revisited

Privacy, Ethics, and Mining Data About Individuals

Is There More to Data Science?

Final Example: From CrowdSourcing to CloudSourcing

Final Words


Appendix Proposal Review Guide

Business and Data Understanding

Data Preparation

Modeling

Evaluation and Deployment


Appendix Another Sample Proposal

Scenario and Proposal


Glossary

Appendix Bibliography

Index

Colophon