CRDTs - Building distributed collaborative tools
How to build a peer-to-peer distributed consensus free collaborative software
I reach out to meet you in your inbox with my best wishes for the coming year 2023. A Happy New Year to you my friend, my reader! Let’s start the year with a bang! In this article, I dive deep into CRDTs and some most common problems.
CRDTs (Conflict-free Replicated Data Type)-
CRDT is a data structure that is replicated across multiple nodes in a distributed network, with the following features:
A node can update any replica independently, concurrently and without any coordination with other replicas.
An algorithm (which is a part of the data structure itself) automatically resolves any inconsistencies that might occur.
Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge. This feature is also known as Optimistic Replication.
An approach to replication in which a replica can process updates locally, without waiting for communication with other replicas. T his means that replicas can become temporarily inconsistent, but CRDTs ensure that replicas nevertheless converge towards a consistent state (see Strong Eventual Consistency).
The CRDT concept was formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero and Marek Zawirski. It is widely used in Soundcloud, Riak, Redis etc. The motivating factor behind development of CRDTs were collaborative text editing softwares.
CRDT: Easy to do a bad implementation -
CRDTs are easy to implement but it can cause some hard to overcome problems. In these post I will be talking about the two major problems.
Interleaving anomalies - Collaborative Text Editor
Reordering elements in list CRDT - Collaborative To-do List
Interleaving Anomalies - The problems in Collaborative Text Editor
We can start by assuming that a text document can be represented on a number line. The start of the document is 0 and the end is 1. The characters in between can be represented by rational numbers lying between them. (To the math folks, did you remember that a rational set is dense? Can you send the proof? 😉 )
Let us assume that the initial document contain only the word - “Helo”.
Now, Node A and Node B changes the document concurrently by adding “l” and “!” respectively.
H e l o(0.2, 0.4, 0.6, 0.8) ⇒ edit to H e l l o !
Node A changes - {
(0.2, A, “H”), (0.4, A, “e”),
(0.6, A, “l”), (0.8, A, “o”)
}
Union
Node B changes - {
(0.7, A, “L”), (0.2, B, “!”)
}
= {
(0.2, A, “H”), (0.4, A, “e”),
(0.6, A, “L”), (0.7, A, “L”),
(0.8, A, “o”), (0.9, B, “!”)
The above was an example of the working of CRDT and its conflict resolution. The conflicts are resolved by giving preference to the random rational numbers assigned to the characters. Some CRDT implementations also randomise the rational number generation so that we don’t see conflicts.
However, this implementation has a problem! The number generated don’t know how to preserve order across nodes! Hence we get interleaving problem(check out the image below).
Following are some Text Editing CRDTs -
Logroot , LSEQ - interleaving
RGA - lesser anomaly - meaning it has very less chances of having interleaving. This is because of the tree based data structure which RGA uses. Here is a solution by Martin Klepmann et al which overcomes interleaving.
Treedoc - no interleaving (no proof, just conjecture) poor storage performance
Woot - no interleaving (no proof, just conjecture) poor storage performance
Astrong - interleaving
Reordering Problem - The problems in Collaborative Todo List
Let us start with the following -
Replica A
Task 1
Task 2
Task 3
transformed to
Task 3
Task 1
Task 2
CRDTS don’t support “move” out of the box. It supports insert and delete. If we implement move as “delete and reinsert”, it will introduce duplication.
To overcome this issue CRDTs have implemented List as follows -
List CRDT + AWSet(Add-Wins Set) + LWW(Last Write Wins).
Add-wins set (AWSet):
A set datatype in which additions take precedence over removals. For example, if one replica removes and re-adds an element, while another replica concurrently removes the element, then the merged outcome is that the element is in the set. Contrast with remove-wins set.
This, however, doesn’t solve Moving range of items problem. This problem is still unsolved!
If you want to use CRDTs to build tools in javascript, one can use the following library - https://github.com/automerge/automerge.
Please feel free to reach out to me if you need help with building large scale distributed systems, deep learning or data engineering solutions at your organisation, I would be more than happy to help.
I hope that you enjoyed this read, if so then please share this article with your friends, let’s build a solid community. I will be back soon with another well thought/researched article delivered straight in your inbox, more consistent this year. See you!