The CAP Theorem Distilled
Anyone who has studied Distributed Computing in a graduate school computer science course appreciates that one of the hardest problems to solve is that of keeping writes in multiple locations synchronized. Are the writes synchronous or asynchronous? If the latter, what is the replication delay in the average and worst cases? How are write-inconsistencies resolved in the face of parallel asynchronous writes? Can you ensure that the last write wins? How?
If synchronous, then why don’t you worry about availability, write-throughput, and latency?
The CAP theorem states that it is possible to optimize at most 2 of ‘C’, ‘A’, and ‘P’. ‘C’ refers to strong consistency, ‘A’ refers to availability, and ‘P’ refers to network partition tolerance.
‘P’ in CAP
Network partitions play a role in systems with replicated data. In order to keep copies of data consistent with one another, the replicas need to communicate over the network. When a network partition occurs (e.g. due to a fiber cut or downed route), different replicas might see different views of the data. Naturally, this can result in inconsistent data. Any system needs to support ‘P’.
‘A’ vs ‘C’ in CAP
If we can have only 2 of ‘C’, ‘A’, and ‘P’ and ‘P’ needs to be supported by the system, we need to choose between ‘C’ and ‘A’. If you choose ‘C’, your system might implement 2-phase commit (a.k.a 2PC) . In this CP system, writes to multiple physical hosts are being conducted in a single transaction that is managed by the writer. The client is implementing the 2PC. 2PC can limit availability for the following reasons:
- The client is monopolizing a connection to each storage node involved in the transaction for the duration of the 2PC
- This reduces maximum write throughput as connections are underutilized
- The client takes longer to commit a write to any one storage node
- This increases write latency
On the other hand, if you opt for an AP system, you are opening the door to potential data inconsistencies. Your writes might occur synchronously to a subset of nodes and resolved asynchronously during write propagation or during read (a.k.a. read repair). AP systems can get quite complicated (relative to CP systems).
Many eCommerce and social networking sites need to be highly available, but can live with permanent or temporary data inconsistencies. Oftentimes, it is enough for the data to be consistent, not accurate. I call this the “Consistency, not Accuracy” principle. With the exception of financial systems and a few other cases, most online businesses can live by this principle.
I recommend the following white-paper that discuss CAP and how Netflix looks at it: