Books & Videos

Table of Contents

  1. Chapter 1 Secondary Sort: Introduction

    1. Solutions to the Secondary Sort Problem

    2. MapReduce/Hadoop Solution to Secondary Sort

    3. Spark Solution to Secondary Sort

  2. Chapter 2 Secondary Sort: A Detailed Example

    1. Secondary Sorting Technique

    2. Complete Example of Secondary Sorting

    3. Sample Run—Old Hadoop API

    4. Sample Run—New Hadoop API

  3. Chapter 3 Top 10 List

    1. Top N, Formalized

    2. MapReduce/Hadoop Implementation: Unique Keys

    3. Spark Implementation: Unique Keys

    4. Spark Implementation: Nonunique Keys

    5. Spark Top 10 Solution Using takeOrdered()

    6. MapReduce/Hadoop Top 10 Solution: Nonunique Keys

  4. Chapter 4 Left Outer Join

    1. Left Outer Join Example

    2. Implementation of Left Outer Join in MapReduce

    3. Spark Implementation of Left Outer Join

    4. Spark Implementation with leftOuterJoin()

  5. Chapter 5 Order Inversion

    1. Example of the Order Inversion Pattern

    2. MapReduce/Hadoop Implementation of the Order Inversion Pattern

    3. Sample Run

  6. Chapter 6 Moving Average

    1. Example 1: Time Series Data (Stock Prices)

    2. Example 2: Time Series Data (URL Visits)

    3. Formal Definition

    4. POJO Moving Average Solutions

    5. MapReduce/Hadoop Moving Average Solution

  7. Chapter 7 Market Basket Analysis

    1. MBA Goals

    2. Application Areas for MBA

    3. Market Basket Analysis Using MapReduce

    4. Spark Solution

  8. Chapter 8 Common Friends

    1. Input

    2. POJO Common Friends Solution

    3. MapReduce Algorithm

    4. Solution 1: Hadoop Implementation Using Text

    5. Solution 2: Hadoop Implementation Using ArrayListOfLongsWritable

    6. Spark Solution

  9. Chapter 9 Recommendation Engines Using MapReduce

    1. Customers Who Bought This Item Also Bought

    2. Frequently Bought Together

    3. Recommend Connection

  10. Chapter 10 Content-Based Recommendation: Movies

    1. Input

    2. MapReduce Phase 1

    3. MapReduce Phases 2 and 3

    4. Movie Recommendation Implementation in Spark

  11. Chapter 11 Smarter Email Marketing with the Markov Model

    1. Markov Chains in a Nutshell

    2. Markov Model Using MapReduce

    3. Spark Solution

  12. Chapter 12 K-Means Clustering

    1. What Is K-Means Clustering?

    2. Application Areas for Clustering

    3. Informal K-Means Clustering Method: Partitioning Approach

    4. K-Means Distance Function

    5. K-Means Clustering Formalized

    6. MapReduce Solution for K-Means Clustering

    7. K-Means Implementation by Spark

  13. Chapter 13 k-Nearest Neighbors

    1. kNN Classification

    2. Distance Functions

    3. kNN Example

    4. An Informal kNN Algorithm

    5. Formal kNN Algorithm

    6. Java-like Non-MapReduce Solution for kNN

    7. kNN Implementation in Spark

  14. Chapter 14 Naive Bayes

    1. Training and Learning Examples

    2. Conditional Probability

    3. The Naive Bayes Classifier in Depth

    4. The Naive Bayes Classifier: MapReduce Solution for Symbolic Data

    5. The Naive Bayes Classifier: MapReduce Solution for Numeric Data

    6. Naive Bayes Classifier Implementation in Spark

    7. Using Spark and Mahout

  15. Chapter 15 Sentiment Analysis

    1. Sentiment Examples

    2. Sentiment Scores: Positive or Negative

    3. A Simple MapReduce Sentiment Analysis Example

    4. Sentiment Analysis in the Real World

  16. Chapter 16 Finding, Counting, and Listing All Triangles in Large Graphs

    1. Basic Graph Concepts

    2. Importance of Counting Triangles

    3. MapReduce/Hadoop Solution

    4. Spark Solution

  17. Chapter 17 K-mer Counting

    1. Input Data for K-mer Counting

    2. Applications of K-mer Counting

    3. K-mer Counting Solution in MapReduce/Hadoop

    4. K-mer Counting Solution in Spark

  18. Chapter 18 DNA Sequencing

    1. Input Data for DNA Sequencing

    2. Input Data Validation

    3. DNA Sequence Alignment

    4. MapReduce Algorithms for DNA Sequencing

  19. Chapter 19 Cox Regression

    1. The Cox Model in a Nutshell

    2. Cox Regression Using R

    3. Cox Regression Application

    4. Cox Regression POJO Solution

    5. Input for MapReduce

    6. Cox Regression Using MapReduce

  20. Chapter 20 Cochran-Armitage Test for Trend

    1. Cochran-Armitage Algorithm

    2. Application of Cochran-Armitage

    3. MapReduce Solution

  21. Chapter 21 Allelic Frequency

    1. Basic Definitions

    2. Formal Problem Statement

    3. MapReduce Solution for Allelic Frequency

    4. MapReduce Solution, Phase 1

    5. MapReduce Solution, Phase 2

    6. MapReduce Solution, Phase 3

    7. Special Handling of Chromosomes X and Y

  22. Chapter 22 The T-Test

    1. Performing the T-Test on Biosets

    2. MapReduce Problem Statement

    3. Input

    4. Expected Output

    5. MapReduce Solution

    6. Spark Implementation

  23. Chapter 23 Pearson Correlation

    1. Pearson Correlation Formula

    2. Pearson Correlation Example

    3. Data Set for Pearson Correlation

    4. POJO Solution for Pearson Correlation

    5. POJO Solution Test Drive

    6. MapReduce Solution for Pearson Correlation

    7. Hadoop Implementation Classes

    8. Spark Solution for Pearson Correlation

    9. Spearman Correlation Using Spark

  24. Chapter 24 DNA Base Count

    1. FASTA Format

    2. FASTQ Format

    3. MapReduce Solution: FASTA Format

    4. Sample Run

    5. MapReduce Solution: FASTQ Format

    6. Spark Solution: FASTA Format

    7. Spark Solution: FASTQ Format

  25. Chapter 25 RNA Sequencing

    1. Data Size and Format

    2. MapReduce Workflow

    3. RNA Sequencing Analysis Overview

    4. MapReduce Algorithms for RNA Sequencing

  26. Chapter 26 Gene Aggregation

    1. Input

    2. Output

    3. MapReduce Solutions (Filter by Individual and by Average)

    4. Gene Aggregation in Spark

    5. Spark Solution: Filter by Individual

    6. Spark Solution: Filter by Average

  27. Chapter 27 Linear Regression

    1. Basic Definitions

    2. Simple Example

    3. Problem Statement

    4. Input Data

    5. Expected Output

    6. MapReduce Solution Using SimpleRegression

    7. Hadoop Implementation Classes

    8. MapReduce Solution Using R’s Linear Model

  28. Chapter 28 MapReduce and Monoids

    1. Introduction

    2. Definition of Monoid

    3. Monoidic and Non-Monoidic Examples

    4. MapReduce Example: Not a Monoid

    5. MapReduce Example: Monoid

    6. Spark Example Using Monoids

    7. Conclusion on Using Monoids

    8. Functors and Monoids

  29. Chapter 29 The Small Files Problem

    1. Solution 1: Merging Small Files Client-Side

    2. Solution 2: Solving the Small Files Problem with CombineFileInputFormat

    3. Alternative Solutions

  30. Chapter 30 Huge Cache for MapReduce

    1. Implementation Options

    2. Formalizing the Cache Problem

    3. An Elegant, Scalable Solution

    4. Implementing the LRUMap Cache

    5. MapReduce Using the LRUMap Cache

  31. Chapter 31 The Bloom Filter

    1. Bloom Filter Properties

    2. A Simple Bloom Filter Example

    3. Bloom Filters in Guava Library

    4. Using Bloom Filters in MapReduce

  32. Appendix Bioset

  33. Appendix Spark RDDs

    1. Spark Operations

    2. Tuple<N>

    3. RDDs