Gentle Introduction to Consensus
Replicated State Machines.
This blog introduces the concept of consensus and replicated state machines. This is first post of a series of posts around consensus.
A replicated state machine is deterministic. It has to be replicated across many computers but overall functions as a single state machine. This state machine will still function, even if there are network partitions.
In a replicated state machine, if a transaction(may be distributed like 2PC and 3PC) is valid, a set of inputs will cause the state of the system to transition to the next state.
Transactions are the atomic, it means either they happen or don’t. algorithms basically provide strong consistency, but are powerless to network partitions.
In distributed system, although the nodes are different, the system appears as a single node to end-user(s). Multiple copies/replicas of the database run on multiple nodes at the same time.
What do we mean by Correctness in a distributed system?
For a distributed system to be correct, these two properties should be satisfied.
Safety — A safety property proscribes discrete bad things from occurring during an execution. For example, no deadlock.
Liveness — A liveness property prescribes good things for every execution or, equivalently, describes something that must happen during an execution. For example, Fair access to a resource in the presence of contention.
What are some assumptions in a replicated state machine?
The replicated state machine must continually accept new transactions into transaction log. It must do so despite the fact that:
Unavailability of some nodes in the cluster.
Unreliable network and non-preservation of order of messages received/sent.
There is no global clock to help determine the order of events.
Hence, the need of Consensus Algorithms!
what is the difference between consensus and data consistency?
These two are very similar, confusing and can even be interchanged in many occasions, but here are the subtle differences:
Data consistency, more like outcomes and goals, is the desired state of the system, but does not define how this state is achieved.
Consensus, which is also the result and goal, but also includes a general method to reach this state of consensus, which is voting.
More about Consensus
The application scenarios of consensus algorithms are very wide. When we jump out of the context of data replication that leads to data consistency, and from the perspective of the consensus algorithm of voting, consensus is everywhere:
Leader election — Elect a leader and let every node to know about it.
Mutual exclusion — Only one node at a time access the critical section.
Distributed transaction — Commit or abort distributed transaction.
Note - The common consensus algorithms(RAFT, PAXOS) choose to avoid the Byzantine Generals Problem, which is the situation where nodes are malicious(Bitcoin has Proof of Work). Since consensus algorithms are usually applied to internal independent systems, this premise is also generally acceptable.
We consider a consensus protocol correct if and only if the following three conditions are satisfied.
1. Agreement — All non-faulty nodes decide on the same output value.
2. Validity — The decided value must have been proposed by some node in the group.
3. Termination— All non-faulty nodes eventually decide on some output value.
The traditional consensus algorithms works with voting-based model. There is another type of consensus algorithms which used in proof-based systems.
In voting-based system, one node propose a values, other nodes vote for that value. Votes are exchange between nodes via via message passing.
With the invention of blockchain we have further classification of distributed consensus -
Deterministic - Number of nodes are known at all times. Database for example.
Non-Deterministic - Nodes are free to join or leave the cluster at any time. Blockchain.
What about the message arrival time? Is is easier to make assumptions when we have a timeout scenario?
In synchronous environments, messages are delivered within a fixed time frame
In asynchronous environments, there’s no guarantee of a message being delivered.
The theory proposed by Fischer, Lynch, and Paterson in 1985 on their paper Impossibility of Distributed Consensus with One Faulty Process. It says, even a single faulty process makes it impossible to reach consensus among deterministic asynchronous system.
Reaching consensus in a synchronous environment is possible because we can make assumptions about the maximum time it takes for messages to get delivered.
There are, however, two ways to navigate through FLP:
Use synchrony assumptions : use timeouts. This is what is used by RAFT and PAXOS.
Use non-determinism: Blockchains.
I hope I was able to help you out with the basic understanding of the world of consensus. Obviously, one post is not enough and I am planning to dig deeper and go more technical in Paxos/Raft next time. I would encourage you to go through the references.
I am also planning to publish an article on Merkel Trees.
Please feel free to reach out to me if you need help with building large scale distributed systems, deep learning or data engineering solutions at your organisation, I would be more than happy to help.
Read more of my work -
I hope that you enjoyed this read, if so then please share this article with your friends, let’s build a solid community. I will be back soon with another well thought/researched article delivered straight in your inbox, more consistent this year. See you!