Designing Distributed Systems

Share this post

CRDTs - Building distributed collaborative tools

distributeddataengines.substack.com

CRDTs - Building distributed collaborative tools

How to build a peer-to-peer distributed consensus free collaborative software

Vipul Vaibhaw
Jan 1, 2023
Share

I reach out to meet you in your inbox with my best wishes for the coming year 2023. A Happy New Year to you my friend, my reader! Let’s start the year with a bang! In this article, I dive deep into CRDTs and some most common problems.

Designing Distributed Systems is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


CRDTs (Conflict-free Replicated Data Type)-

CRDT is a data structure that is replicated across multiple nodes in a distributed network, with the following features:

  1. A node can update any replica independently, concurrently and without any coordination with other replicas.

  2. An algorithm (which is a part of the data structure itself) automatically resolves any inconsistencies that might occur.

  3. Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge. This feature is also known as Optimistic Replication.

    An approach to replication in which a replica can process updates locally, without waiting for communication with other replicas. T his means that replicas can become temporarily inconsistent, but CRDTs ensure that replicas nevertheless converge towards a consistent state (see Strong Eventual Consistency).

The CRDT concept was formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero and Marek Zawirski. It is widely used in Soundcloud, Riak, Redis etc. The motivating factor behind development of CRDTs were collaborative text editing softwares.


Thank you for reading Designing Distributed Systems. This post is public so feel free to share it.

Share


CRDT: Easy to do a bad implementation -

CRDTs are easy to implement but it can cause some hard to overcome problems. In these post I will be talking about the two major problems.

  1. Interleaving anomalies - Collaborative Text Editor

  2. Reordering elements in list CRDT - Collaborative To-do List

Interleaving Anomalies - The problems in Collaborative Text Editor

We can start by assuming that a text document can be represented on a number line. The start of the document is 0 and the end is 1. The characters in between can be represented by rational numbers lying between them. (To the math folks, did you remember that a rational set is dense? Can you send the proof? 😉 )

Let us assume that the initial document contain only the word - “Helo”.

Now, Node A and Node B changes the document concurrently by adding “l” and “!” respectively.

H e l o(0.2, 0.4, 0.6, 0.8) ⇒ edit to H e l l o !

Node A changes - {

(0.2, A, “H”), (0.4, A, “e”),

(0.6, A, “l”), (0.8, A, “o”)

}

Union

Node B changes - {

(0.7, A, “L”), (0.2, B, “!”)

}

= {

(0.2, A, “H”), (0.4, A, “e”),

(0.6, A, “L”), (0.7, A, “L”),

(0.8, A, “o”), (0.9, B, “!”)

The above was an example of the working of CRDT and its conflict resolution. The conflicts are resolved by giving preference to the random rational numbers assigned to the characters. Some CRDT implementations also randomise the rational number generation so that we don’t see conflicts.

However, this implementation has a problem! The number generated don’t know how to preserve order across nodes! Hence we get interleaving problem(check out the image below).

Following are some Text Editing CRDTs -

  1. Logroot , LSEQ - interleaving

  2. RGA - lesser anomaly - meaning it has very less chances of having interleaving. This is because of the tree based data structure which RGA uses. Here is a solution by Martin Klepmann et al which overcomes interleaving.

  3. Treedoc - no interleaving (no proof, just conjecture) poor storage performance

  4. Woot - no interleaving (no proof, just conjecture) poor storage performance

  5. Astrong - interleaving

Reordering Problem - The problems in Collaborative Todo List

Let us start with the following -

Replica A

  1. Task 1

  2. Task 2

  3. Task 3

transformed to

  1. Task 3

  2. Task 1

  3. Task 2

CRDTS don’t support “move” out of the box. It supports insert and delete. If we implement move as “delete and reinsert”, it will introduce duplication.

To overcome this issue CRDTs have implemented List as follows -

List CRDT + AWSet(Add-Wins Set) + LWW(Last Write Wins).

Add-wins set (AWSet):
A set datatype in which additions take precedence over removals. For example, if one replica removes and re-adds an element, while another replica concurrently removes the element, then the merged outcome is that the element is in the set. Contrast with remove-wins set.

This, however, doesn’t solve Moving range of items problem. This problem is still unsolved!


If you want to use CRDTs to build tools in javascript, one can use the following library - https://github.com/automerge/automerge.


Share Designing Distributed Systems

Please feel free to reach out to me if you need help with building large scale distributed systems, deep learning or data engineering solutions at your organisation, I would be more than happy to help.

I hope that you enjoyed this read, if so then please share this article with your friends, let’s build a solid community. I will be back soon with another well thought/researched article delivered straight in your inbox, more consistent this year. See you!

You can connect with me on Twitter and LinkedIn.

Share
Comments
Top
New

No posts

Ready for more?

© 2023 Vipul Vaibhaw
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing