Designing Data-Intensive Applications
The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Publisher: O'Reilly Media
Final Release Date: September 2014
Pages: 562

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.

Data is at the center of many great challenges in system design. There are so many tools to choose from: databases, NoSQL datastores, stream and batch processors, caches, indexes, message brokers, and so on. Moreover, there are so many issues to consider: scalability, consistency, reliability, efficiency, maintainability. How do you make the right choices for your application? How do you make sense of all the buzzwords?

Designing Data-Intensive Applications is a comprehensive guide to the landscape of systems for storing and processing data. In this book, Martin Kleppmann covers a wide range of popular technologies, comparing their pros and cons. Although software keeps changing, the fundamental ideas behind it stay the same. Through this book, you’ll understand those principles, how they apply in practice, and how make full use of data in your applications.

With this book, you will:

  • Look under the hood of the systems you already use, so that you can use them more effectively and diagnose any issues
  • Know the strengths and weaknesses of different tools, letting you make informed decisions
  • Learn to navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
  • Understand the distributed systems research upon which modern databases are built
  • Peek behind the scenes of major online services, and learn from their experience
Table of Contents
Product Details
About the Author
Recommended for You
Customer Reviews

REVIEW SNAPSHOT®

by PowerReviews
oreillyDesigning Data-Intensive Applications
 
5.0

(based on 29 reviews)

Ratings Distribution

  • 5 Stars

     

    (28)

  • 4 Stars

     

    (1)

  • 3 Stars

     

    (0)

  • 2 Stars

     

    (0)

  • 1 Stars

     

    (0)

100%

of respondents would recommend this to a friend.

Pros

  • Well-written (22)
  • Easy to understand (17)
  • Accurate (16)
  • Concise (14)
  • Helpful examples (10)

Cons

No Cons

Best Uses

  • Intermediate (22)
  • Expert (14)
  • Student (9)
  • Novice (6)
    • Reviewer Profile:
    • Developer (18), Designer (9), Sys admin (5), Educator (3), Maker (3)

Reviewed by 29 customers

Displaying reviews 1-10

Back to top

Previous | Next »

 
5.0

Should be Required Reading In Uni.

By David

from Madrid

About Me Sys Admin

Verified Reviewer

Pros

  • Accurate
  • Easy to understand
  • Well-written

Cons

    Best Uses

    • Intermediate
    • Student

    Comments about oreilly Designing Data-Intensive Applications:

    It should be required reading for any student coursing IT related studies. Precise, tecnical, yet easy to follow. Very well structured and with an extraordinary las section.

     
    5.0

    Excellent book!

    By Mario Renau

    from Valencia (Spain)

    About Me Designer

    Verified Buyer

    Pros

    • Accurate
    • Easy to understand
    • Well-written

    Cons

      Best Uses

      • Expert
      • Intermediate

      Comments about oreilly Designing Data-Intensive Applications:

      The author describes very well the principles of current distributed data systems.
      I highly recommend this book to anyone needing to understand the basis for robust data driven applications.

       
      5.0

      The best book out there on design of apps

      By Arthur Ronald

      from Rio de Janeiro, RJ

      About Me Designer, Developer, Sys Admin

      Verified Reviewer

      Pros

      • Accurate
      • Concise
      • Helpful examples
      • Well-written

      Cons

        Best Uses

        • Expert
        • Intermediate

        Comments about oreilly Designing Data-Intensive Applications:

        I've never seen a well-written and practical book on software design. It covers a broad range of topics in a advanced fashion. Such way, it can give you a another level of knowledge.

        Advice: I don't recommend this book if you aren't a experienced developer or architect

        (1 of 1 customers found this review helpful)

         
        5.0

        Best book I've read in 2016

        By edude03

        from Toronto, Ontario

        About Me Developer, Educator, Maker, Sys Admin

        Verified Reviewer

        Pros

        • Accurate
        • Concise
        • Easy to understand
        • Well-written

        Cons

          Best Uses

          • Expert
          • Intermediate

          Comments about oreilly Designing Data-Intensive Applications:

          A great book to get you ramped up on handling data!

           
          5.0

          Extraordinarily strong overview of building data systems

          By danmil

          from Boston, MA

          About Me Developer

          Verified Buyer

          Pros

          • Accurate
          • Well-written

          Cons

            Best Uses

            • Expert
            • Intermediate
            • Novice

            Comments about oreilly Designing Data-Intensive Applications:

            Tremendous combo of rigorous understanding with real-world applicability.

             
            5.0

            One of the best books in the field!

            By Ryan Worsley

            from London, UK

            About Me Designer, Developer

            Verified Buyer

            Pros

            • Accurate
            • Concise
            • Easy to understand
            • Helpful examples
            • Well-written

            Cons

              Best Uses

              • Expert
              • Intermediate

              Comments about oreilly Designing Data-Intensive Applications:

              Kleppmann does an outstanding job of both being exhaustive in his analysis of distributed systems technology and remaining engaging, thoughtful and interesting throughout in what is often a dry subject. 5/5 would study again.

               
              5.0

              Worth reading

              By simon

              from Germany

              About Me Developer, Engineer

              Verified Buyer

              Pros

              • Concise
              • Easy to understand
              • Well-written

              Cons

                Best Uses

                • Novice
                • Student

                Comments about oreilly Designing Data-Intensive Applications:

                This book is of very good quality for its early release state. The text is didactically well written. The examples are easy to follow and nicely supplement the discussed concepts. The author presents a broad spectrum of technologies, their pros/cons and some real-world use cases. Every chapter ends with a list of resources for further exploration. Exactly what I needed.

                 
                5.0

                Great book!

                By Mikhail

                from Russia

                About Me Developer

                Verified Buyer

                Pros

                • Accurate
                • Concise
                • Well-written

                Cons

                  Best Uses

                  • Expert
                  • Intermediate

                  Comments about oreilly Designing Data-Intensive Applications:

                  Great resource for any developer or architect

                  (0 of 1 customers found this review helpful)

                   
                  5.0

                  Excellent technology overview

                  By J. Vogler

                  from Stuttgart, Germany

                  About Me Designer

                  Verified Reviewer

                  Pros

                  • Concise
                  • Easy to understand
                  • Helpful examples
                  • Well-written

                  Cons

                    Best Uses

                    • Expert
                    • Intermediate

                    Comments about oreilly Designing Data-Intensive Applications:

                    This book is on of the best technical books I recently read. It gives a very good overview over current IT architectures and components for dealing with data in complex data driven applications. It covers as well the theory as also the state of the art implementations for the different tasks. It goes as deep as necessary to understand the concepts. The It is easy to read but not superficial.

                    I can recommend this book to anyone who needs to design robust reliable architectures for data driven applications.

                    (3 of 4 customers found this review helpful)

                     
                    5.0

                    Mini-encyclopedia of Modern Data Engineering!

                    By Emre Sevinc

                    from Antwerp, Belgium

                    About Me Designer, Developer, Educator

                    Verified Reviewer

                    Pros

                    • Accurate
                    • Helpful examples
                    • Well-written

                    Cons

                      Best Uses

                      • Expert
                      • Intermediate
                      • Student

                      Comments about oreilly Designing Data-Intensive Applications:

                      I consider this book a mini-encyclopedia of modern data engineering. Like a specialized encyclopedia, it covers a broad field and in considerable detail. But it is not a practice or a cookbook for a particular Big Data, NoSQL or newSQL product. What the author does is to lay down the principles of current distributed big data systems, and he does a very fine job of it.

                      If you are after the obscure details of a particular product, or some tutorials and "how-to"s, go elsewhere. But if you want to understand the main principles, issues, as well as the challenges of data intensive and distributed system, you've come to the right place.

                      Martin Klepman starts out by solidly giving the reader the conceptual framework in the first chapter: what does reliability mean? How is it defined? What is the difference between a "fault" and a "failure"? How do you describe load on a data intensive system? How do you talk about performance and scalability in a meaningful way? What does it mean to have a "maintainable" system?

                      Second chapter gives a brief overview of different data models and shows the suitability of of them to different use cases, using modern challenges that companies such as Twitter faced. This chapter is a solid foundation for understanding the difference between the relational data model, document data model, graph data model, as well as the languages used for processing data stored using these methods.

                      The third chapter goes into a lot of detail regarding the building blocks of different types of database systems: the data structures and algorithms used for the different systems shown in the previous chapter are described; you get to know what hash indexes, SSTables (Sorted String Tables), Log-Structured Merge trees (LSM-trees), B-trees and other data structures. Following this chapter, you are introduced to Column Databases and the underlying principles and structures behind them.

                      Following these, the books describes the methods of data encoding, starting from the venerable XML and JSON, and going into the details of formats such as Avro, Thrift and Protocol Buffers, showing the trade-offs between these choices.

                      Following the building blocks and foundations above is the Part II of the book and this is where things start to get really interesting because now the reader starts to learn about challenging topic of distributed systems: how to use the basic building blocks in a setting where anything can go wrong in the most unexpected ways. This Part II is the most complex of part the book: you learn about how to replicate your data, what happens when replication lags behind, how you provide a consistent picture to the end-user or the end-programmer, what algorithms are used for leader election in consensus systems, and how leaderless replication works.

                      One of the primary purpose of using a distributed system is to have an advantage over a single, central system, and that advantage is to provide better service, meaning a more resilient service with an acceptable level of responsiveness. This means you need to distribute the load and your data, and there a lot of schemes for partitioning your data. Chapter 6 of Part II provides a lot of details on partitioning, keys, indexes, secondary indexes and how to handle data queries when your data is partitioned using various methods.

                      No data systems book can be complete without touching the topic of transactions, and this book is not an exception to the rule. You learn about the fuzziness surrounding the definition of ACID, isolation levels, and serializability.

                      The remaining two chapters of Part II, Chapter 8 and 9 is probably the most interesting part of the book. You are now ready to learn the gory details of how to deal with all kinds of network and other types of faults to keep your data system in usable and consistent state, the problems with the CAP theorem, version vectors and that they are not vector clocks, Byzantine faults, how to have a sense of causality and ordering in a distributed system, why algorithms such as Paxos, Raft, and ZAB (used in ZooKeeper) exist, distributed transactions, and many more topics.

                      The rest of the book, that is Part III, is dedicated to batch and stream processing. The author describes the famous Map Reduce batch processing model in detail, and briefly touches upon the modern frameworks for processing distributed data processing such as Apache Spark. The final chapter discusses event streams and messaging systems and challenges that arise when trying to process this "data in motion". You might not be in the business of building the next generation streaming system, but you'll definitely need to have a handle on these topics because you'll encounter the described issues in the practical stream processing systems that you deal with daily as a data engineer.

                      As I said in the opening of this review, consider this a mini-encyclopedia for the modern data engineer, and also don't be surprised if you see more than 100 references at the end of some chapters; if the author tried to include most of them in the text itself, the book would well go beyond 2000 pages!

                      At the time of my writing, the book is 90% complete, according to its official site there's only 1 more chapter to be added (Chapter 12: Materialized Views and Caching), so it is safe to say that I recommend this book to anyone working with distributed big data systems, dealing with NoSQL and newSQL databases, document stores, column oriented data stores, streaming and messaging systems. As for me, it'll definitely be my go-to reference for the upcoming years for these topics.

                      Displaying reviews 1-10

                      Back to top

                      Previous | Next »

                       
                      Buy 2 Get 1 Free Free Shipping Guarantee
                      Buying Options
                      Immediate Access - Go Digital what's this?
                      Pre-Order  Print:  $44.99
                      March 2017 (est.)