Every day, companies struggle to scale critical applications. As traffic volume and data demands increase, these applications become more complicated and brittle, exposing risks and compromising availability. This practical guide shows IT, devops, and system reliability managers how to prevent an application from becoming slow, inconsistent, or downright unavailable as it grows.
Scaling isn’t just about handling more users; it’s also about managing risk and ensuring availability. Author Lee Atchison provides basic techniques for building applications that can handle huge quantities of traffic, data, and demand without affecting the quality your customers expect.
In five parts, this book explores:
Availability: learn techniques for building highly available applications, and for tracking and improving availability going forward
Risk management: identify, mitigate, and manage risks in your application, test your recovery/disaster plans, and build out systems that contain fewer risks
Services and microservices: understand the value of services for building complicated applications that need to operate at higher scale
Scaling applications: assign services to specific teams, label the criticalness of each service, and devise failure scenarios and recovery plans
Cloud services: understand the structure of cloud-based services, resource allocation, and service distribution
Chapter 1What Is Availability?
Availability Versus Reliability
What Causes Poor Availability?
Chapter 2Five Focuses to Improve Application Availability
Focus #1: Build with Failure in Mind
Focus #2: Always Think About Scaling
Focus #3: Mitigate Risk
Focus #4: Monitor Availability
Focus #5: Respond to Availability Issues in a Predictable and Defined Way
Chapter 3Measuring Availability
Don’t Be Fooled
Availability by the Numbers
Chapter 4Improving Your Availability When It Slips
Measure and Track Your Current Availability
Automate Your Manual Processes
Improve Your Systems
Your Changing and Growing Application
Keeping on Top of Availability
Chapter 5What Is Risk Management?
Remove Worst Offenders
Managing Risk Summary
Chapter 6Likelihood Versus Severity
The Top 10 List: Low Likelihood, Low Severity Risk
The Order Database: Low Likelihood, High Severity Risk
Custom Fonts: High Likelihood, Low Severity Risk
T-Shirt Photos: High Likelihood, High Severity Risk
Chapter 7The Risk Matrix
Scope of the Risk Matrix
Creating the Risk Matrix
Using the Risk Matrix for Planning
Maintaining the Risk Matrix
Chapter 8Risk Mitigation
Disaster Recovery Plans
Improving Our Risk Situation
Chapter 9Game Days
Staging Versus Production Environments
Concerns with Running Game Days in Production
Game Day Testing
Chapter 10Building Systems with Reduced Risk
Examples of Idempotent Interfaces
Redundancy Improvements That Increase Complexity
Services and Microservices
Chapter 11Why Use Services?
The Monolith Application
The Service-Based Application
The Ownership Benefit
The Scaling Benefit
Chapter 12Using Microservices
What Should Be a Service?
Going Too Far
The Right Balance
Chapter 13Dealing with Service Failures
Cascading Service Failures
Responding to a Service Failure
Chapter 14Two Mistakes High
What Is “Two Mistakes High”?
“Two Mistakes High” in Practice
Managing Your Applications
The Space Shuttle
Chapter 15Service Ownership
Single Team Owned Service Architecture
Advantages of a STOSA Application and Organization
What Does it Mean to Be a Service Owner?
Chapter 16Service Tiers
What Are Service Tiers?
Assigning Service Tier Labels to Services
Example: Online Store
Chapter 17Using Service Tiers
Chapter 18Service-Level Agreements
What are Service-Level Agreements?
External Versus Internal SLAs
Why Are Internal SLAs Important?
SLAs as Trust
SLAs for Problem Diagnosis
Performance Measurements for SLAs
How Many and Which Internal SLAs?
Additional Comments on SLAs
Chapter 19Continuous Improvement
Examine Your Application Regularly
Where’s the Data?
The Importance of Continuous Improvement
Chapter 20Change and the Cloud
What Has Changed in the Cloud?
Chapter 21Distributing the Cloud
Availability Zones Are Not Data Centers
Maintaining Location Diversity for Availability Reasons
Chapter 22Managed Infrastructure
Structure of Cloud-Based Services
Implications of Using Managed Resources
Implications of Using Non-Managed Resources
Monitoring and CloudWatch
Chapter 23Cloud Resource Allocation
Allocated-Capacity Resource Allocation
Usage-Based Resource Allocation
The Pros and Cons of Resource Allocation Techniques
Lee Atchison is the Principal Cloud Architect and Advocate at New Relic. He’s been with New Relic for four years where he led the building of the New Relic infrastructure products, and helped New Relic architect a solid service-based system. He has a specific expertise in building highly available systems.
Lee has 28 years of industry experience, and learned cloud-based, scalable systems during his seven years as a Senior Manager at Amazon.com, where among other things he led the creation of AWS Elastic Beanstalk.
The animal on the cover of Architecting for Scale is a textile cone sea snail (Conus textile). It is also known as the "cloth of gold cone" due to the unique yellow-brown and white color pattern of its shell, which usually grows to about three to four inches in length. The textile cone is found in the shallow waters of the Red Sea, off the coasts of Australia and West Africa, and in the tropical regions of the Indian and Pacific oceans.
Like other members of the genus Conus, the textile cone is predatory and feeds on other snails, killing their prey by injecting them with venom from a "radula," an appendage that resembles a small needle. The "conotoxin" used by the textile cone is extremely dangerous and can cause paralysis or death.
The textile cone reproduces by laying several hundred eggs at once, which grow on their own into adult snails. Their shells are sometimes sold as trinkets, but the textile cone is plentiful and their population is not threatened or endangered.
Perhaps the real "meat" will arrive in later chapters? I found the content rather "lightweight". I usually write notes, but had none after reading. Perhaps adequate for a total novice to systems and architecture.
Bottom Line No, I would not recommend this to a friend