Getting Started with Storm

Book description

Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you’ll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives.

Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing.

  • Learn how to program Storm components: spouts for data input and bolts for data transformation
  • Discover how data is exchanged between spouts and bolts in a Storm topology
  • Make spouts fault-tolerant with several commonly used design strategies
  • Explore bolts—their life cycle, strategies for design, and ways to implement them
  • Scale your solution by defining each component’s level of parallelism
  • Study a real-time web analytics system built with Node.js, a Redis server, and a Storm topology
  • Write spouts and bolts with non-JVM languages such as Python, Ruby, and Javascript

Table of contents

  1. Getting Started with Storm
  2. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  3. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. Safari® Books Online
    4. How to Contact Us
    5. Acknowledgements
  4. 1. Basics
    1. The Components of Storm
    2. The Properties of Storm
  5. 2. Getting Started
    1. Operation Modes
      1. Local Mode
      2. Remote Mode
    2. Hello World Storm
      1. Checking Java Installation
      2. Creating the Project
    3. Creating Our First Topology
      1. Spout
      2. Bolts
      3. The Main Class
      4. See It In Action
    4. Conclusion
  6. 3. Topologies
    1. Stream Grouping
      1. Shuffle Grouping
      2. Fields Grouping
      3. All Grouping
      4. Custom Grouping
      5. Direct Grouping
      6. Global Grouping
      7. None Grouping
    2. LocalCluster versus StormSubmitter
    3. DRPC Topologies
  7. 4. Spouts
    1. Reliable versus Unreliable Messages
    2. Getting Data
      1. Direct Connection
      2. Enqueued Messages
      3. DRPC
    3. Conclusion
  8. 5. Bolts
    1. Bolt Lifecycle
    2. Bolt Structure
    3. Reliable versus Unreliable Bolts
    4. Multiple Streams
    5. Multiple Anchoring
    6. Using IBasicBolt to Ack Automatically
  9. 6. A Real-Life Example
    1. The Node.js Web Application
    2. Starting the Node.js Web Application
    3. The Storm Topology
      1. UsersNavigationSpout
      2. GetCategoryBolt
      3. UserHistoryBolt
      4. ProductCategoriesCounterBolt
      5. NewsNotifierBolt
    4. The Redis Server
      1. Product Information
      2. User Navigation Queue
      3. Intermediate Data
      4. Results
    5. Testing the Topology
      1. Test Initialization
      2. A Test Example
    6. Notes on Scalability and Availability
  10. 7. Using Non-JVM Languages with Storm
    1. The Multilang Protocol Specification
      1. Initial Handshake
      2. Start Looping and Read or Write Tuples
        1. Emit
        2. Ack
        3. Fail
        4. Log
  11. 8. Transactional Topologies
    1. The Design
    2. Transactions in Action
      1. The Spout
        1. The RQ class
        2. The Coordinator
        3. The Emitter
      2. The Bolts
      3. The Committer Bolts
    3. Partitioned Transactional Spouts
    4. Opaque Transactional Topologies
  12. A. Installing the Storm Client
  13. B. Installing Storm Cluster
  14. C. Real Life Example Setup
    1. Installing Redis
    2. Installing Node.js
    3. Building and Testing
    4. Running the Topology
    5. Playing with the Example
  15. About the Authors
  16. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  17. Copyright

Product information

  • Title: Getting Started with Storm
  • Author(s): Jonathan Leibiusky, Gabriel Eisbruch, Dario Simonassi
  • Release date: August 2012
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781449324049