Fast Data: Smart and at Scale

Book description

The need for fast data applications is growing rapidly, driven by the IoT, the surge in machine-to-machine (M2M) data, global mobile device proliferation, and the monetization of SaaS platforms. So how do you combine real-time, streaming analytics with real-time decisions in an architecture that’s reliable, scalable, and simple?

In this O’Reilly report, Ryan Betts and John Hugg from VoltDB examine ways to develop apps for fast data, using pre-defined patterns. These patterns are general enough to suit both the do-it-yourself, hybrid batch/streaming approach, as well as the simpler, proven in-memory approach available with certain fast database offerings.

Their goal is to create a collection of fast data app development recipes. We welcome your contributions, which will be tested and included in future editions of this report.

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Fast Data Application Value
    1. Looking Beyond Streaming
  3. Fast Data and the Enterprise
  4. 1. What Is Fast Data?
    1. Applications of Fast Data
      1. Ingestion
      2. Streaming Analytics
      3. Per-Event Transactions
    2. Uses of Fast Data
      1. Front End for Hadoop
      2. Enriching Streaming Data
      3. Queryable Cache
  5. 2. Disambiguating ACID and CAP
    1. What Is ACID?
      1. What Does ACID Stand For?
    2. What Is CAP?
      1. What Does CAP Stand For?
    3. How Is CAP Consistency Different from ACID Consistency?
    4. What Does “Eventual Consistency” Mean in This Context?
  6. 3. Recipe: Integrate Streaming Aggregations and Transactions
    1. Idea in Brief
    2. Pattern: Reject Requests Past a Threshold
    3. Pattern: Alerting on Variations from Predicted Trends
    4. When to Avoid This Pattern
    5. Related Concepts
  7. 4. Recipe: Design Data Pipelines
    1. Idea in Brief
    2. Pattern: Use Streaming Transformations to Avoid ETL
    3. Pattern: Connect Big Data Analytics to Real-Time Stream Processing
    4. Pattern: Use Loose Coupling to Improve Reliability
    5. When to Avoid Pipelines
  8. 5. Recipe: Pick Failure-Recovery Strategies
    1. Idea in Brief
    2. Pattern: At-Most-Once Delivery
    3. Pattern: At-Least-Once Delivery
    4. Pattern: Exactly-Once Delivery
  9. 6. Recipe: Combine At-Least-Once Delivery with Idempotent Processing to Achieve Exactly-Once Semantics
    1. Idea in Brief
    2. Pattern: Use Upserts Over Inserts
    3. Pattern: Tag Data with Unique Identifiers
      1. Subpattern: Fine-Grained Timestamps
      2. Subpattern: Unique IDs at the Event Source
    4. Pattern: Use Kafka Offsets as Unique Identifiers
    5. Example: Call Center Processing
      1. Version 1: Events Are Ordered
      2. Version 2: Events Are Not Ordered
    6. When to Avoid This Pattern
    7. Related Concepts and Techniques
  10. Glossary

Product information

  • Title: Fast Data: Smart and at Scale
  • Author(s): Ryan Betts, John Hugg
  • Release date: October 2015
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491940372