Carl Willimott - Tech Blog: CAP Theorem: failures will happen

CAP Theorem: failures will happen

March 20, 2023

3 min read

Back

CAP theorem AKA Brewer's theorem is most likely the most popular one in computer science. It's your classic pick two out of three, but in reality you typically only need to worry about this when you have a distributed system as we will explore in the article.

Definition

CAP is actually an acronym, which stands for the following:

  • Consistency - you will receive the same response regardless of the node which serves your request.
  • Availability - every request is guaranteed to return the most up-to-date information.
  • Partition tolerance - the system is able to function despite an arbitrary number of failing nodes.

Consistency

Imaging you have a number of nodes which are able to process a request, picture a number of servers around the globe in different geographies. It's reasonable to expect that when making such a request, it shouldn't matter which node serves your request and as long as this holds true, and each one is able to return the same response we can say that the system is consistent.

Availability

Like with our previous example, it's fairly common to have multiple nodes capable of serving a request (think horizontal scalability), but would you expect each one to return the same response?

Imagine if you provide a popular API which relies on users sharing data and serves millions of users each day. As one could reasonably expect, there is a networking problem in one data centre which means that several nodes lose connection. You may decide that in order to continue allowing access to the API even though there are issues because it's fine for the data to become eventually consistent and prefer this to having a partial outage or degraded service offering. In this case we can say that the system prioritises availability.

Partition tolerance

For a single node system, the data will always be consistent and available. As soon as multiple nodes are introduced it's paramount to consider which is more important for your specific use case.

Is it really so binary?

No not really...

In the case of a network partition a system may take a hybrid approach with varying levels of consistency and availability depending on a particular request, how important it is or any other parameters which are deemed necessary. The key to excellent system design is to ensure that you plan for as many eventualities which are feasible and ensure that the user experience is of the highest priority.

Useful links