Books & Videos

Table of Contents

  1. Chapter 1 Introduction: What Is Data Science?

    1. Big Data and Data Science Hype

    2. Getting Past the Hype

    3. Why Now?

    4. The Current Landscape (with a Little History)

    5. A Data Science Profile

    6. Thought Experiment: Meta-Definition

    7. OK, So What Is a Data Scientist, Really?

  2. Chapter 2 Statistical Inference, Exploratory Data Analysis, and the Data Science Process

    1. Statistical Thinking in the Age of Big Data

    2. Exploratory Data Analysis

    3. The Data Science Process

    4. Thought Experiment: How Would You Simulate Chaos?

    5. Case Study: RealDirect

  3. Chapter 3 Algorithms

    1. Machine Learning Algorithms

    2. Three Basic Algorithms

    3. Exercise: Basic Machine Learning Algorithms

    4. Summing It All Up

    5. Thought Experiment: Automated Statistician

  4. Chapter 4 Spam Filters, Naive Bayes, and Wrangling

    1. Thought Experiment: Learning by Example

    2. Naive Bayes

    3. Fancy It Up: Laplace Smoothing

    4. Comparing Naive Bayes to k-NN

    5. Sample Code in bash

    6. Scraping the Web: APIs and Other Tools

    7. Jake’s Exercise: Naive Bayes for Article Classification

  5. Chapter 5 Logistic Regression

    1. Thought Experiments

    2. Classifiers

    3. M6D Logistic Regression Case Study

    4. Media 6 Degrees Exercise

  6. Chapter 6 Time Stamps and Financial Modeling

    1. Kyle Teague and GetGlue

    2. Timestamps

    3. Cathy O’Neil

    4. Thought Experiment

    5. Financial Modeling

    6. Exercise: GetGlue and Timestamped Event Data

  7. Chapter 7 Extracting Meaning from Data

    1. William Cukierski

    2. The Kaggle Model

    3. Thought Experiment: What Are the Ethical Implications of a Robo-Grader?

    4. Feature Selection

    5. David Huffaker: Google’s Hybrid Approach to Social Research

  8. Chapter 8 Recommendation Engines: Building a User-Facing Data Product at Scale

    1. A Real-World Recommendation Engine

    2. Thought Experiment: Filter Bubbles

    3. Exercise: Build Your Own Recommendation System

  9. Chapter 9 Data Visualization and Fraud Detection

    1. Data Visualization History

    2. What Is Data Science, Redux?

    3. A Sample of Data Visualization Projects

    4. Mark’s Data Visualization Projects

    5. Data Science and Risk

    6. Data Visualization at Square

    7. Ian’s Thought Experiment

    8. Data Visualization for the Rest of Us

  10. Chapter 10 Social Networks and Data Journalism

    1. Social Network Analysis at Morning Analytics

    2. Social Network Analysis

    3. Terminology from Social Networks

    4. Thought Experiment

    5. Morningside Analytics

    6. More Background on Social Network Analysis from a Statistical Point of View

    7. Data Journalism

  11. Chapter 11 Causality

    1. Correlation Doesn’t Imply Causation

    2. OK Cupid’s Attempt

    3. The Gold Standard: Randomized Clinical Trials

    4. A/B Tests

    5. Second Best: Observational Studies

    6. Three Pieces of Advice

  12. Chapter 12 Epidemiology

    1. Madigan’s Background

    2. Thought Experiment

    3. Modern Academic Statistics

    4. Medical Literature and Observational Studies

    5. Stratification Does Not Solve the Confounder Problem

    6. Is There a Better Way?

    7. Research Experiment (Observational Medical Outcomes Partnership)

    8. Closing Thought Experiment

  13. Chapter 13 Lessons Learned from Data Competitions: Data Leakage and Model Evaluation

    1. Claudia’s Data Scientist Profile

    2. Data Mining Competitions

    3. How to Be a Good Modeler

    4. Data Leakage

    5. How to Avoid Leakage

    6. Evaluating Models

    7. Choosing an Algorithm

    8. A Final Example

    9. Parting Thoughts

  14. Chapter 14 Data Engineering: MapReduce, Pregel, and Hadoop

    1. About David Crawshaw

    2. Thought Experiment

    3. MapReduce

    4. Word Frequency Problem

    5. Other Examples of MapReduce

    6. Pregel

    7. About Josh Wills

    8. Thought Experiment

    9. On Being a Data Scientist

    10. Economic Interlude: Hadoop

    11. Back to Josh: Workflow

    12. So How to Get Started with Hadoop?

  15. Chapter 15 The Students Speak

    1. Process Thinking

    2. Naive No Longer

    3. Helping Hands

    4. Your Mileage May Vary

    5. Bridging Tunnels

    6. Some of Our Work

  16. Chapter 16 Next-Generation Data Scientists, Hubris, and Ethics

    1. What Just Happened?

    2. What Is Data Science (Again)?

    3. What Are Next-Gen Data Scientists?

    4. Being an Ethical Data Scientist

    5. Career Advice

  1. Index

  2. Colophon