Saturday, August 26, 2017

Principles and Practices of Cloud Applications

This article summarizes the essential principles, patterns, and practices of cloud application architecture and design that can increase organizational innovation and decrease long-term system maintenance costs.  My primary sources are the architecture teams at Amazon Web Services and Microsoft Azure as well as my professional experience daily applying these ideas to production systems.  Subsequent articles will focus on processes and technologies specific to AWS and Azure.

 

 

Principles

Patterns

Practices

Scalability

Decomposition

Partitioning

Stateless

Elasticity

Caching

CDN

Queue Worker

Pipes and Filters

Encapsulation

Materialized View

Eventual Consistency

Understand customer SLA for performance.

Measure and profile performance with load benchmarks.

Partition and decompose workloads into discrete units.

Partition around data, network, and compute limits.

Design for horizontal scalability (scaling out/in vs. up/down).

Ensure applications and services are stateless.

Avoid client affinity and server-side session state.

Minimize coordination and shared state.

Queue I/O and CPU intensive requests as background tasks.

Distribute background tasks across multiple workers.

Cache items that don’t change much.

Use CDN for caching static data.

Reduce chatty interactions between components.

Build golden component images using Docker.

Leverage PAAS auto-scaling features with golden images.

Consider compression and binary format for DTO transfer.

Optimize SQL indexes and queries.

Consider document DB or de-normalizing data model.

Avoid locking database  resources.

Prefer optimistic concurrency and eventual consistency.

Minimize time that connections and resources are in use.

Minimize number of connections required.

Resiliency

Redundancy

Load Balancing

Retry

Circuit Breaker

Replication

Healthchecks

Telemetry

 

Understand customer SLA for availability.

Analyze system to identify failures, impact, and recovery.

Use redundant components to minimize single point of failure.

Use load balancing to distribute requests.

Handle transient failures with limited retries and backoff.

Handle persistent failures with circuit breaker that falls back to reasonable action while dependency is unavailable.

Use multiple availability zones.

Monitor health of dependencies and endpoints.

Checkpoint long-running transactions.

Design for failure and self-healing.

Understand replication methods for data sources.

Automate persistent data backup.

Document failover/failback processes and test them.

Throttle excessively active clients.  Block bad actors (DDoS).

Perform fault injection testing to verify system resiliency.

Security

Defense in Depth

Least Privilege

Traceability

Federated Identity

Gatekeeper

Compartmentalize

Apply defense in depth; secure all resources - not just edges.

Secure weakest link. Trust reluctantly and verify. Fail securely.

Pay attention to data privacy and residency requirements.

Protect data at rest (storage encryption) and in transit (SSL).

Mitigate DDoS using cloud platform’s network layer.

Enforce ACL’s at network, application, and data layers.

Conduct vulnerability analysis and penetration tests.

Manage keys carefully and secure with hardware tokens.

Use SSO, multi-factor authentication, and federated identity.

Use anti-virus and anti-malware for network and host nodes.

Simplify BCDR through PaaS centric, automated backup and recovery.

Integrate diagnostics of network, application, and data layers to have monitor system and correlate enterprise intrusions.

Prefer connectivity from cloud to on-prem resources using dedicated, private WAN links vs. VPN tunnels over public links.

Application Design

High Cohesion

Loose Coupling

Single Responsibility

Open/Closed

Interface Segregation

Dependency Inversion

DDD

CQRS

RESTful Web API

Messaging

Design with the organization goals and end-user in mind.

Design for evolution and change.

Prefer loosely-coupled components whose communication is asynchronous that can evolve, heal, and scale smarter.

Separate infrastructure logic from domain logic.

Prefer RESTful Web API’s for external communication.

Prefer asynchronous messaging for internal communication.

 

Management

Telemetry

Automation

Source Control

Agile

Design for IT Ops (Deploy, Monitor, Investigate, Secure)

Document system release process and use change control.

Automate system build and deployment processes.

Implement logging and alerting into systems.

Instrument to analyze root cause of errors.

Instrument to monitor availability, performance, and health.

Standardize log formats and metrics. 

Inventory, inspect, and audit cloud assets.

Use distributed tracing (asynchronous, Correlation ID).

Version and control configuration like other system artifacts.

Use Agile project methodology for iterative development.

 

References

       https://aws.amazon.com/architecture/

       https://docs.microsoft.com/en-us/azure/architecture/