MongoDB: The Definitive Guide, 2nd Edition

Book description

Manage the huMONGOus amount of data collected through your web application with MongoDB. This authoritative introduction—written by a core contributor to the project—shows you the many advantages of using document-oriented databases, and demonstrates how this reliable, high-performance system allows for almost infinite horizontal scalability.

This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.

  • Get started with MongoDB core concepts and vocabulary
  • Perform basic write operations at different levels of safety and speed
  • Create complex queries, with options for limiting, skipping, and sorting results
  • Design an application that works well with MongoDB
  • Aggregate data, including counting, finding distinct values, grouping documents, and using MapReduce
  • Gather and interpret statistics about your collections and databases
  • Set up replica sets and automatic failover in MongoDB
  • Use sharding to scale horizontally, and learn how it impacts applications
  • Delve into monitoring, security and authentication, backup/restore, and other administrative tasks

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. How This Book Is Organized
      1. Getting Started with MongoDB
      2. Developing with MongoDB
      3. Replication
      4. Sharding
      5. Application Administration
      6. Server Administration
      7. Appendixes
    2. Conventions Used in This Book
    3. Using Code Examples
    4. Safari® Books Online
    5. How to Contact Us
    6. Acknowledgments
  3. I. Introduction to MongoDB
    1. 1. Introduction
      1. Ease of Use
      2. Easy Scaling
      3. Tons of Features…
      4. …Without Sacrificing Speed
      5. Let’s Get Started
    2. 2. Getting Started
      1. Documents
      2. Collections
        1. Dynamic Schemas
        2. Naming
          1. Subcollections
      3. Databases
      4. Getting and Starting MongoDB
      5. Introduction to the MongoDB Shell
        1. Running the Shell
        2. A MongoDB Client
        3. Basic Operations with the Shell
          1. Create
          2. Read
          3. Update
          4. Delete
      6. Data Types
        1. Basic Data Types
        2. Dates
        3. Arrays
        4. Embedded Documents
        5. _id and ObjectIds
          1. ObjectIds
          2. Autogeneration of _id
      7. Using the MongoDB Shell
        1. Tips for Using the Shell
        2. Running Scripts with the Shell
        3. Creating a .mongorc.js
        4. Customizing Your Prompt
        5. Editing Complex Variables
        6. Inconvenient Collection Names
    3. 3. Creating, Updating, and Deleting Documents
      1. Inserting and Saving Documents
        1. Bulk Insert
        2. Insert Validation
      2. Removing Documents
        1. Remove Speed
      3. Updating Documents
        1. Document Replacement
        2. Using Modifiers
          1. Getting started with the “$set” modifier
          2. Incrementing and decrementing
          3. Array modifiers
          4. Adding elements
          5. Using arrays as sets
          6. Removing elements
          7. Positional array modifications
          8. Modifier speed
        3. Upserts
          1. The save shell helper
        4. Updating Multiple Documents
        5. Returning Updated Documents
      4. Setting a Write Concern
    4. 4. Querying
      1. Introduction to find
        1. Specifying Which Keys to Return
        2. Limitations
      2. Query Criteria
        1. Query Conditionals
        2. OR Queries
        3. $not
        4. Conditional Semantics
      3. Type-Specific Queries
        1. null
        2. Regular Expressions
        3. Querying Arrays
          1. $all
          2. $size
          3. The $slice operator
          4. Returning a matching array element
          5. Array and range query interactions
        4. Querying on Embedded Documents
      4. $where Queries
        1. Server-Side Scripting
      5. Cursors
        1. Limits, Skips, and Sorts
          1. Comparison order
        2. Avoiding Large Skips
          1. Paginating results without skip
          2. Finding a random document
        3. Advanced Query Options
        4. Getting Consistent Results
        5. Immortal Cursors
      6. Database Commands
        1. How Commands Work
  4. II. Designing Your Application
    1. 5. Indexing
      1. Introduction to Indexing
        1. Introduction to Compound Indexes
        2. Using Compound Indexes
          1. Choosing key directions
          2. Using covered indexes
          3. Implicit indexes
        3. How $-Operators Use Indexes
          1. Inefficient operators
          2. Ranges
          3. OR queries
        4. Indexing Objects and Arrays
          1. Indexing embedded docs
          2. Indexing arrays
          3. Multikey index implications
        5. Index Cardinality
      2. Using explain() and hint()
        1. The Query Optimizer
      3. When Not to Index
      4. Types of Indexes
        1. Unique Indexes
          1. Compound unique indexes
          2. Dropping duplicates
        2. Sparse Indexes
      5. Index Administration
        1. Identifying Indexes
        2. Changing Indexes
    2. 6. Special Index and Collection Types
      1. Capped Collections
        1. Creating Capped Collections
        2. Sorting Au Naturel
        3. Tailable Cursors
        4. No-_id Collections
      2. Time-To-Live Indexes
      3. Full-Text Indexes
        1. Search Syntax
        2. Full-Text Search Optimization
        3. Searching in Other Languages
      4. Geospatial Indexing
        1. Types of Geospatial Queries
        2. Compound Geospatial Indexes
        3. 2D Indexes
      5. Storing Files with GridFS
        1. Getting Started with GridFS: mongofiles
        2. Working with GridFS from the MongoDB Drivers
        3. Under the Hood
    3. 7. Aggregation
      1. The Aggregation Framework
      2. Pipeline Operations
        1. $match
        2. $project
          1. Pipeline expressions
            1. Mathematical expressions
            2. Date expressions
            3. String expressions
            4. Logical expressions
          2. A projection example
        3. $group
          1. Grouping operators
            1. Arithmetic operators
            2. Extreme operators
            3. Array operators
          2. Grouping behavior
        4. $unwind
        5. $sort
        6. $limit
        7. $skip
        8. Using Pipelines
      3. MapReduce
        1. Example 1: Finding All Keys in a Collection
        2. Example 2: Categorizing Web Pages
        3. MongoDB and MapReduce
          1. The finalize function
          2. Keeping output collections
          3. MapReduce on a subset of documents
          4. Using a scope
          5. Getting more output
      4. Aggregation Commands
        1. count
        2. distinct
        3. group
          1. Using a finalizer
          2. Using a function as a key
    4. 8. Application Design
      1. Normalization versus Denormalization
        1. Examples of Data Representations
        2. Cardinality
        3. Friends, Followers, and Other Inconveniences
          1. Dealing with the Wil Wheaton effect
      2. Optimizations for Data Manipulation
        1. Optimizing for Document Growth
        2. Removing Old Data
      3. Planning Out Databases and Collections
      4. Managing Consistency
      5. Migrating Schemas
      6. When Not to Use MongoDB
  5. III. Replication
    1. 9. Setting Up a Replica Set
      1. Introduction to Replication
      2. A One-Minute Test Setup
      3. Configuring a Replica Set
        1. rs Helper Functions
        2. Networking Considerations
      4. Changing Your Replica Set Configuration
      5. How to Design a Set
        1. How Elections Work
      6. Member Configuration Options
        1. Creating Election Arbiters
          1. Use at most one arbiter
          2. The downside to using an arbiter
        2. Priority
        3. Hidden
        4. Slave Delay
        5. Building Indexes
    2. 10. Components of a Replica Set
      1. Syncing
        1. Initial Sync
        2. Handling Staleness
      2. Heartbeats
        1. Member States
      3. Elections
      4. Rollbacks
        1. When Rollbacks Fail
    3. 11. Connecting to a Replica Set from Your Application
      1. Client-to-Replica-Set Connection Behavior
      2. Waiting for Replication on Writes
        1. What Can Go Wrong?
        2. Other Options for “w”
      3. Custom Replication Guarantees
        1. Guaranteeing One Server per Data Center
        2. Guaranteeing a Majority of Nonhidden Members
        3. Creating Other Guarantees
      4. Sending Reads to Secondaries
        1. Consistency Considerations
        2. Load Considerations
        3. Reasons to Read from Secondaries
    4. 12. Administration
      1. Starting Members in Standalone Mode
      2. Replica Set Configuration
        1. Creating a Replica Set
        2. Changing Set Members
        3. Creating Larger Sets
        4. Forcing Reconfiguration
      3. Manipulating Member State
        1. Turning Primaries into Secondaries
        2. Preventing Elections
        3. Using Maintenance Mode
      4. Monitoring Replication
        1. Getting the Status
        2. Visualizing the Replication Graph
        3. Replication Loops
        4. Disabling Chaining
        5. Calculating Lag
        6. Resizing the Oplog
        7. Restoring from a Delayed Secondary
        8. Building Indexes
        9. Replication on a Budget
        10. How the Primary Tracks Lag
      5. Master-Slave
        1. Converting Master-Slave to a Replica Set
        2. Mimicking Master-Slave Behavior with Replica Sets
  6. IV. Sharding
    1. 13. Introduction to Sharding
      1. Introduction to Sharding
      2. Understanding the Components of a Cluster
      3. A One-Minute Test Setup
    2. 14. Configuring Sharding
      1. When to Shard
      2. Starting the Servers
        1. Config Servers
        2. The mongos Processes
        3. Adding a Shard from a Replica Set
        4. Adding Capacity
        5. Sharding Data
      3. How MongoDB Tracks Cluster Data
        1. Chunk Ranges
        2. Splitting Chunks
      4. The Balancer
    3. 15. Choosing a Shard Key
      1. Taking Stock of Your Usage
      2. Picturing Distributions
        1. Ascending Shard Keys
        2. Randomly Distributed Shard Keys
        3. Location-Based Shard Keys
      3. Shard Key Strategies
        1. Hashed Shard Key
        2. Hashed Shard Keys for GridFS
        3. The Firehose Strategy
        4. Multi-Hotspot
      4. Shard Key Rules and Guidelines
        1. Shard Key Limitations
        2. Shard Key Cardinality
      5. Controlling Data Distribution
        1. Using a Cluster for Multiple Databases and Collections
        2. Manual Sharding
    4. 16. Sharding Administration
      1. Seeing the Current State
        1. Getting a Summary with sh.status
        2. Seeing Configuration Information
          1. config.shards
          2. config.databases
          3. config.collections
          4. config.chunks
          5. config.changelog
          6. config.tags
          7. config.settings
      2. Tracking Network Connections
        1. Getting Connection Statistics
        2. Limiting the Number of Connections
      3. Server Administration
        1. Adding Servers
        2. Changing Servers in a Shard
          1. Changing a shard from a standalone server to replica set
        3. Removing a Shard
        4. Changing Config Servers
      4. Balancing Data
        1. The Balancer
        2. Changing Chunk Size
        3. Moving Chunks
        4. Jumbo Chunks
          1. Distributing jumbo chunks
          2. Preventing jumbo chunks
        5. Refreshing Configurations
  7. V. Application Administration
    1. 17. Seeing What Your Application Is Doing
      1. Seeing the Current Operations
        1. Finding Problematic Operations
        2. Killing Operations
        3. False Positives
        4. Preventing Phantom Operations
      2. Using the System Profiler
      3. Calculating Sizes
        1. Documents
        2. Collections
        3. Databases
      4. Using mongotop and mongostat
    2. 18. Data Administration
      1. Setting Up Authentication
        1. Authentication Basics
        2. Setting Up Authentication
        3. How Authentication Works
      2. Creating and Deleting Indexes
        1. Creating an Index on a Standalone Server
        2. Creating an Index on a Replica Set
        3. Creating an Index on a Sharded Cluster
        4. Removing Indexes
        5. Beware of the OOM Killer
      3. Preheating Data
        1. Moving Databases into RAM
        2. Moving Collections into RAM
        3. Custom-Preheating
      4. Compacting Data
      5. Moving Collections
      6. Preallocating Data Files
    3. 19. Durability
      1. What Journaling Does
        1. Planning Commit Batches
        2. Setting Commit Intervals
      2. Turning Off Journaling
        1. Replacing Data Files
        2. Repairing Data Files
        3. The mongod.lock File
        4. Sneaky Unclean Shutdowns
      3. What MongoDB Does Not Guarantee
      4. Checking for Corruption
      5. Durability with Replication
  8. VI. Server Administration
    1. 20. Starting and Stopping MongoDB
      1. Starting from the Command Line
        1. File-Based Configuration
      2. Stopping MongoDB
      3. Security
        1. Data Encryption
        2. SSL Connections
      4. Logging
    2. 21. Monitoring MongoDB
      1. Monitoring Memory Usage
        1. Introduction to Computer Memory
        2. Tracking Memory Usage
        3. Tracking Page Faults
        4. Minimizing Btree Misses
        5. IO Wait
        6. Tracking Background Flush Averages
      2. Calculating the Working Set
        1. Some Working Set Examples
      3. Tracking Performance
        1. Tracking Free Space
      4. Monitoring Replication
    3. 22. Making Backups
      1. Backing Up a Server
        1. Filesystem Snapshot
        2. Copying Data Files
        3. Using mongodump
          1. Moving collections and databases with mongodump and mongorestore
          2. Administrative complications with unique indexes
      2. Backing Up a Replica Set
      3. Backing Up a Sharded Cluster
        1. Backing Up and Restoring an Entire Cluster
        2. Backing Up and Restoring a Single Shard
      4. Creating Incremental Backups with mongooplog
    4. 23. Deploying MongoDB
      1. Designing the System
        1. Choosing a Storage Medium
          1. An example from the wild
        2. Recommended RAID Configurations
        3. CPU
        4. Choosing an Operating System
        5. Swap Space
        6. Filesystem
      2. Virtualization
        1. Turn Off Memory Overcommitting
        2. Mystery Memory
        3. Handling Network Disk IO Issues
        4. Using Non-Networked Disks
      3. Configuring System Settings
        1. Turning Off NUMA
        2. Setting a Sane Readahead
        3. Disabling Hugepages
        4. Choosing a Disk Scheduling Algorithm
        5. Don’t Track Access Time
        6. Modifying Limits
      4. Configuring Your Network
      5. System Housekeeping
        1. Synchronizing Clocks
        2. The OOM Killer
        3. Turn Off Periodic Tasks
  9. A. Installing MongoDB
    1. Choosing a Version
    2. Windows Install
      1. Installing as a Service
    3. POSIX (Linux, Mac OS X, and Solaris) Install
      1. Installing from a Package Manager
  10. B. MongoDB Internals
    1. BSON
    2. Wire Protocol
    3. Data Files
    4. Namespaces and Extents
    5. Memory-Mapped Storage Engine
  11. Index
  12. Colophon
  13. Copyright

Product information

  • Title: MongoDB: The Definitive Guide, 2nd Edition
  • Author(s): Kristina Chodorow
  • Release date: May 2013
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781449344689