Understanding Compression

Book description

If you want to attract and retain users in the booming mobile services market, you need a quick-loading app that won’t churn through their data plans. The key is to compress multimedia and other data into smaller files, but finding the right method is tricky. This witty book helps you understand how data compression algorithms work—in theory and practice—so you can choose the best solution among all the available compression tools.

With tables, diagrams, games, and as little math as possible, authors Colt McAnlis and Aleks Haecky neatly explain the fundamentals. Learn how compressed files are better, cheaper, and faster to distribute and consume, and how they’ll give you a competitive edge.

  • Learn why compression has become crucial as data production continues to skyrocket
  • Know your data, circumstances, and algorithm options when choosing compression tools
  • Explore variable-length codes, statistical compression, arithmetic numerical coding, dictionary encodings, and context modeling
  • Examine tradeoffs between file size and quality when choosing image compressors
  • Learn ways to compress client- and server-generated data objects
  • Meet the inventors and visionaries who created data compression algorithms

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. How to Read This Book
    2. How to Read This Book Backwards
    3. Chapter Synopsis
  3. 1. Let’s Not Be Boring
    1. The Five Buckets of Compression Algorithms
    2. Claude Shannon Is Infuriating!
    3. The Only Thing You Need to Know about Data Compression
      1. A World Built on Data Compression
  4. 2. Do Not Skip This Chapter
    1. Understanding Binary
      1. Base 10 System
      2. Binary Number System
    2. Information Theory
      1. An Excursion into Binary Search
      2. Entropy: The Minimum Bits Needed to Represent a Number
      3. Standard Number Lengths
  5. 3. Breaking Entropy
    1. Understanding Entropy
    2. What This Entropy Stuff Is Good For
    3. Understanding Probability
    4. Breaking Entropy
      1. Example 1: Delta Coding
      2. Example 2: Symbol Grouping
      3. Example 3: Permutations
    5. Information Theory Versus Data Compression
  6. 4. Variable-Length Codes
    1. Morse Code
    2. Probability, Entropy, and Codeword Size
    3. Variable-Length Codes
      1. Using VLCs
      2. Creating VLCs
      3. A Handful of Example VLCs
      4. Finding the Right Code for Your Data Set
  7. 5. Statistical Encoding
    1. Statistically Compressing to Entropy
    2. Huffman Coding
      1. Building a Huffman Tree
      2. Generating Codewords
      3. Encoding and Decoding
      4. Practical Implementations
    3. Arithmetic Coding
      1. Finding the Right Number
      2. Encoding
      3. Picking the Right Output Value
      4. Decoding
      5. Practical Implementations
    4. Asymmetric Numeral Systems
      1. Encoding and Decoding Using a Transform Table
      2. Creating the Reference Table
      3. Using ANS for Compression
      4. Decoding Example
      5. So Where Does the Compression Come From?
    5. Practical Compression: Which Statistical Algorithm Do I Choose?
  8. 6. Adaptive Statistical Encoding
    1. Locality Matters for Entropy
    2. Adaptive VLC Encoding
      1. Dynamically Building a VLC Table
      2. Literals
      3. Resets
      4. Knowing When to Reset
      5. Using This in Practice
    3. Adaptive Arithmetic Coding
    4. Adaptive Huffman Coding
    5. The Modern Choice
  9. 7. Dictionary Transforms
    1. A Basic Dictionary Transform
      1. Finding the Right “Words”
    2. The Lempel-Ziv Algorithm
      1. How LZ Works
      2. Encoding
      3. Decoding
      4. Compressing LZ output
      5. LZ Variants
    3. Collect Them All!
  10. 8. Contextual Data Transforms
    1. Run-Length Encoding
      1. Dealing with Short Runs
      2. Compressing
    2. Delta Coding
      1. XOR Delta Coding
      2. Frame of Reference Delta Coding
      3. Patched Frame of Reference Delta Coding
      4. Compressing Delta-Encoded Data
      5. Does It Work on Text?
    3. Move-to-Front Coding
      1. Avoiding Rogue Symbols
      2. Compressing MTF
    4. Burrows–Wheeler Transform
      1. Ordering Is Important!
      2. How BWT Works
      3. Inverse BWT
      4. Practical Implementations
      5. Compressing BWT
  11. 9. Data Modeling
    1. The Chains of Markov
      1. Markov and Compression
      2. Practical Implementations
    2. Prediction by Partial Matching
      1. The Search Trie
      2. Compressing a Symbol
      3. Choosing a Sensible N Value
      4. Dealing with Unknown Symbols
    3. Context Mixing
      1. Types of Models
      2. Types of Mixing
    4. The Next Big Thing?
  12. 10. Switching Gears
    1. Media-Specific Compression
    2. General-Purpose Compression
    3. Compression in Practice
  13. 11. Evaluating Compression
    1. Compression Usage Scenarios
      1. Compressed Offline, Decompressed On-Client
      2. Compressed On-Client, Decompressed In-Cloud
      3. Compressed In-Cloud, Decompressed On-Client
      4. Compressed On-Client, Decompressed On-Client
    2. Compression Need
    3. Compression Ratio
    4. Compression Performance
    5. Decompression Performance
    6. Ability to Decode-Stream
    7. Comparing Compressors
  14. 12. Compressing Image Data Types
    1. Understanding Quality Versus File Size
      1. What Reduces Image Quality?
      2. Measuring Image Quality
      3. Making This Work
    2. Image Dimensions Are Important
    3. Choosing the Correct Image Format
      1. PNG
      2. JPG
      3. GIF
      4. WebP
      5. And Now for Choosing...
    4. GPU Texture Formats
    5. Vector Formats
    6. Eyes on the Prize
  15. 13. Serialized Data
    1. Understanding Common Use Cases
      1. Dynamically Server-Built Data
      2. Statically Built Server-Owned Data
      3. Dynamically Client-Built Data
      4. Statically Client-Owned Data
    2. Issues with Serialized Formats
      1. Human-Readable Text
      2. Slow Decode Times
    3. Smaller Serialized Data
      1. Use a Binary Serialization Format
      2. Restructure Lists for Better Compression
      3. Organize for Efficient Fetching
      4. Segment Out Data into the Proper Compression Format
  16. 14. Lossy Data Compression
  17. 15. Making the World a Little Smaller
    1. Data Compression and You
    2. Data Compression and the Bottom Line
      1. User Acquisition and Retention
      2. Running Costs
      3. Planning Ahead
    3. Making Your Users’ Lives a Little More Magical and Less Expensive
    4. Thinking About What’s Next in Technology
      1. The Next Five Billion Users
      2. Mobile Networks
    5. ...Starting Now
  18. Glossary of Compression Words
  19. Index

Product information

  • Title: Understanding Compression
  • Author(s): Colt McAnlis, Aleks Haecky
  • Release date: July 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491961483