Neoclassical Narratives: cloud

This article summarizes the essential principles, patterns, and practices of cloud application architecture and design that can increase organizational innovation and decrease long-term system maintenance costs. My primary sources are the architecture teams at Amazon Web Services and Microsoft Azure as well as my professional experience daily applying these ideas to production systems. Subsequent articles will focus on processes and technologies specific to AWS and Azure.

Principles

Patterns

Practices

Scalability

Decomposition

Partitioning

Stateless

Elasticity

Caching

CDN

Queue Worker

Pipes and Filters

Encapsulation

Materialized View

Eventual Consistency

Understand customer SLA for performance.

Measure and profile performance with load benchmarks.

Partition and decompose workloads into discrete units.

Partition around data, network, and compute limits.

Design for horizontal scalability (scaling out/in vs. up/down).

Ensure applications and services are stateless.

Avoid client affinity and server-side session state.

Minimize coordination and shared state.

Queue I/O and CPU intensive requests as background tasks.

Distribute background tasks across multiple workers.

Cache items that don’t change much.

Use CDN for caching static data.

Reduce chatty interactions between components.

Build golden component images using Docker.

Leverage PAAS auto-scaling features with golden images.

Consider compression and binary format for DTO transfer.

Optimize SQL indexes and queries.

Consider document DB or de-normalizing data model.

Avoid locking database resources.

Prefer optimistic concurrency and eventual consistency.

Minimize time that connections and resources are in use.

Minimize number of connections required.

Resiliency

Redundancy

Load Balancing

Retry

Circuit Breaker

Replication

Healthchecks

Telemetry

Understand customer SLA for availability.

Analyze system to identify failures, impact, and recovery.

Use redundant components to minimize single point of failure.

Use load balancing to distribute requests.

Handle transient failures with limited retries and backoff.

Handle persistent failures with circuit breaker that falls back to reasonable action while dependency is unavailable.

Use multiple availability zones.

Monitor health of dependencies and endpoints.

Checkpoint long-running transactions.

Design for failure and self-healing.

Understand replication methods for data sources.

Automate persistent data backup.

Document failover/failback processes and test them.

Throttle excessively active clients. Block bad actors (DDoS).

Perform fault injection testing to verify system resiliency.

Security

Defense in Depth

Least Privilege

Traceability

Federated Identity

Gatekeeper

Compartmentalize

Apply defense in depth; secure all resources - not just edges.

Secure weakest link. Trust reluctantly and verify. Fail securely.

Pay attention to data privacy and residency requirements.

Protect data at rest (storage encryption) and in transit (SSL).

Mitigate DDoS using cloud platform’s network layer.

Enforce ACL’s at network, application, and data layers.

Conduct vulnerability analysis and penetration tests.

Manage keys carefully and secure with hardware tokens.

Use SSO, multi-factor authentication, and federated identity.

Use anti-virus and anti-malware for network and host nodes.

Simplify BCDR through PaaS centric, automated backup and recovery.

Integrate diagnostics of network, application, and data layers to have monitor system and correlate enterprise intrusions.

Prefer connectivity from cloud to on-prem resources using dedicated, private WAN links vs. VPN tunnels over public links.

Application Design

High Cohesion

Loose Coupling

Single Responsibility

Open/Closed

Interface Segregation

Dependency Inversion

DDD

CQRS

RESTful Web API

Messaging

Design with the organization goals and end-user in mind.

Design for evolution and change.

Prefer loosely-coupled components whose communication is asynchronous that can evolve, heal, and scale smarter.

Separate infrastructure logic from domain logic.

Prefer RESTful Web API’s for external communication.

Prefer asynchronous messaging for internal communication.

Management

Telemetry

Automation

Source Control

Agile

Design for IT Ops (Deploy, Monitor, Investigate, Secure)

Document system release process and use change control.

Automate system build and deployment processes.

Implement logging and alerting into systems.

Instrument to analyze root cause of errors.

Instrument to monitor availability, performance, and health.

Standardize log formats and metrics.

Inventory, inspect, and audit cloud assets.

Use distributed tracing (asynchronous, Correlation ID).

Version and control configuration like other system artifacts.

Use Agile project methodology for iterative development.

References

● https://aws.amazon.com/architecture/

● https://docs.microsoft.com/en-us/azure/architecture/

This article is a reboot of my blog. Posts will consist of book reviews, informal musings on economics, history, and software, as well as more formal, didactic articles on technology.

Here is a book summary and review of Scalability Rules by Abbott and Fisher. Practical. Concise. Packed with good advice on design, infrastructure, and organizational processes. The latest edition includes stories that give context to the rules. Highly recommended.

Category	Rule
Design	Do not over engineer the solution. Design scale into solutions: design @ 20X, implement @ 3X, deploy @ 2X Avoid single point of failure. Avoid sequencing components and systems in synchronous series. Strive for statelessness. Watch out for server affinity. Communicate asynchronously as much as possible (prefer pub-sub to RPC). Design your application to be monitored.
Reduce	Reduce object usage including DB, DNS, sockets, images, CPU, MEM, DISK, etc
Cache	Cache static, infrequently changing information and objects appropriately. Use web page cache, image cache, application reference data cache, etc.
Scale	Design to clone things on commodity hardware and network load balancer. Design to split different and similar things. Design your solution to scale out. Use AKF scale cube. Scale out your data centers. Design to leverage the cloud, but be wary of scaling through 3^rd party. Details matter.
Platform	Use database appropriately. Need ACID (Atomic, Consistent, Isolated, Durable)? Use firewalls appropriately. Use log files actively. Do not double check your work… do not re-read your data after write. Stop redirecting traffic. Leverage Content Distribution Networks (CDN). Use expire headers in HTTP requests to reduce duplicates for static data. Purge, archive, and cost justify storage.
Process	Test. Measure. Rinse. Repeat. Learn aggressively and especially from failure through team discussion. Do not rely on QA to find mistakes. Design for rollback of code. Be aware of costly relationships and dependencies (networks, databases, etc).
Database	Understand object relationships. Use the correct database lock. Avoid multi-phase commit. Avoid select for update. Avoid select *. Separate business intelligence from transaction processing.
General	Do not do anything more than once. Do not do anything that is unnecessary. Get as much as in one request as possible.

Neoclassical Narratives

Saturday, August 26, 2017

Principles and Practices of Cloud Applications

Saturday, December 3, 2016

Book Summary | Scalability Rules