This article summarizes the essential principles, patterns, and practices of cloud application architecture and design that can increase organizational innovation and decrease long-term system maintenance costs. My primary sources are the architecture teams at Amazon Web Services and Microsoft Azure as well as my professional experience daily applying these ideas to production systems. Subsequent articles will focus on processes and technologies specific to AWS and Azure.
Principles |
Patterns |
Practices |
Scalability |
Decomposition Partitioning Stateless Elasticity Caching CDN Queue Worker Pipes and Filters Encapsulation Materialized View Eventual Consistency |
Understand customer SLA for performance. Measure and profile performance with load benchmarks. Partition and decompose workloads into discrete units. Partition around data, network, and compute limits. Design for horizontal scalability (scaling out/in vs. up/down). Ensure applications and services are stateless. Avoid client affinity and server-side session state. Minimize coordination and shared state. Queue I/O and CPU intensive requests as background tasks. Distribute background tasks across multiple workers. Cache items that don’t change much. Use CDN for caching static data. Reduce chatty interactions between components. Build golden component images using Docker. Leverage PAAS auto-scaling features with golden images. Consider compression and binary format for DTO transfer. Optimize SQL indexes and queries. Consider document DB or de-normalizing data model. Avoid locking database resources. Prefer optimistic concurrency and eventual consistency. Minimize time that connections and resources are in use. Minimize number of connections required. |
Resiliency |
Redundancy Load Balancing Retry Circuit Breaker Replication Healthchecks Telemetry |
Understand customer SLA for availability. Analyze system to identify failures, impact, and recovery. Use redundant components to minimize single point of failure. Use load balancing to distribute requests. Handle transient failures with limited retries and backoff. Handle persistent failures with circuit breaker that falls back to reasonable action while dependency is unavailable. Use multiple availability zones. Monitor health of dependencies and endpoints. Checkpoint long-running transactions. Design for failure and self-healing. Understand replication methods for data sources. Automate persistent data backup. Document failover/failback processes and test them. Throttle excessively active clients. Block bad actors (DDoS). Perform fault injection testing to verify system resiliency. |
Security |
Defense in Depth Least Privilege Traceability Federated Identity Gatekeeper Compartmentalize |
Apply defense in depth; secure all resources - not just edges. Secure weakest link. Trust reluctantly and verify. Fail securely. Pay attention to data privacy and residency requirements. Protect data at rest (storage encryption) and in transit (SSL). Mitigate DDoS using cloud platform’s network layer. Enforce ACL’s at network, application, and data layers. Conduct vulnerability analysis and penetration tests. Manage keys carefully and secure with hardware tokens. Use SSO, multi-factor authentication, and federated identity. Use anti-virus and anti-malware for network and host nodes. Simplify BCDR through PaaS centric, automated backup and recovery. Integrate diagnostics of network, application, and data layers to have monitor system and correlate enterprise intrusions. Prefer connectivity from cloud to on-prem resources using dedicated, private WAN links vs. VPN tunnels over public links. |
Application Design |
High Cohesion Loose Coupling Single Responsibility Open/Closed Interface Segregation Dependency Inversion DDD CQRS RESTful Web API Messaging |
Design with the organization goals and end-user in mind. Design for evolution and change. Prefer loosely-coupled components whose communication is asynchronous that can evolve, heal, and scale smarter. Separate infrastructure logic from domain logic. Prefer RESTful Web API’s for external communication. Prefer asynchronous messaging for internal communication. |
Management |
Telemetry Automation Source Control Agile |
Design for IT Ops (Deploy, Monitor, Investigate, Secure) Document system release process and use change control. Automate system build and deployment processes. Implement logging and alerting into systems. Instrument to analyze root cause of errors. Instrument to monitor availability, performance, and health. Standardize log formats and metrics. Inventory, inspect, and audit cloud assets. Use distributed tracing (asynchronous, Correlation ID). Version and control configuration like other system artifacts. Use Agile project methodology for iterative development. |
References