原文地址:https://github.com/aphyr/distsys-class
这篇文章相当于一个分布式学习的大纲,以我的3遍学习发来看,我这周只是通读了2遍,只达到了第一层级,那这周的文章就当作一个阅读大纲的提炼,先理清楚知识架构。分布式内容本身就十分多,如果不理清结构,很容易混乱,今天就是一个入门的开始。
- What makes a thing distributed
- Nodes and networks
- When networks go wrong
- Low level protocols
- Clocks
Review
We’ve covered the fundamental primitives of distributed systems. Nodes exchange messages through a network, and both nodes and networks can fail in various ways. Protocols like TCP and UDP give us primitive channels for processes to communicate, and we can order events using clocks. Now, we’ll discuss some high-level properties of distributed systems.
- Availability
- Consistency
- Tradeoffs
Review
Availability is a measure of how often operations succeed. Consistency models are the rules that govern what operations can happen and when. Stronger consistency models generally come at the cost of performance and availability. Next, we’ll talk about different ways to build systems, from weak to strong consistency.
- Avoid Consensus Wherever Possible
- Fine, We Need Consensus, What Now?
Review
Systems which only add facts, not retract them, require less coordination to build. We can use gossip systems to broadcast messages to other processes, CRDTs to merge updates from our peers, and HATs for weakly isolated transactions. Serializability and linearizability require consensus, which we can obtain through Paxos, ZAB, VR, or Raft. Now, we’ll talk about different scales of distributed systems.
- Characteristic latencies
Review
We discussed three characteristic scales for distributed systems: multicore processors coupled with a synchronous network, computers linked by a LAN, and datacenters linked by the internet or dedicated fiber. CPU consequences are largely performance concerns: knowing how to minimize coordination. On LANs, latencies are short enough for many network hops before users take notice. In geographically replicated systems, high latencies drive eventually consistent and datacenter-pinned solutions.
- Common distributed systems
Review
We use data structure stores as outsourced heaps: they’re the duct tape of distributed systems. KV stores and relational databases are commonly deployed as systems of record; KV stores use independent keys and are not well-suited to relational data, but offer improved scalability and partial failure vs SQL stores, which offer rich queries and strong transactional guarantees. Distributed search and coordination services round out our basic toolkit for building applications. Streaming systems are applied for continuous, low-latency processing of datasets, and tend to look more like frameworks than databases. Their dual, distributed queues, focus on the messages rather than the transformations.
- A Pattern Language
Review
When possible, try to use a single node instead of a distributed system. Accept that some failures are unavoidable: SLAs and apologies can be cost-effective. To handle catastrophic failure, we use backups. To improve reliability, we introduce redundancy. To scale to large problems, we divide the problem into shards. Immutable values are easy to store and cache, and can be referenced by mutable identities, allowing us to build strongly consistent systems at large scale. As software grows, different components must scale independently, and we break out libraries into distinct services. Service structure goes hand-in-hand with teams.
- Production Concerns
Review
Running distributed systems requires cooperation between developers, QA, and operations engineers. Static analysis, and a test suite including example- and property-based tests, can help ensure program correctness, but understanding production behavior requires comprehensive instrumentation and alerting. Mature distributed systems teams often invest in tooling: traffic shadowing, versioning, incremental deploys, and feature flags. Finally, queues require special care.