Designing Distributed Systems

Share this post

Understanding Causality and Order: Distributed Systems

distributeddataengines.substack.com

Understanding Causality and Order: Distributed Systems

Time, Clocks and Order

Vipul Vaibhaw
Jan 28
1
Share this post

Understanding Causality and Order: Distributed Systems

distributeddataengines.substack.com

In the following post, we will understand and examine various methods used to understand time and event order in a distributed system.

A way of representing the system of events. Taken from - https://lamport.azurewebsites.net/pubs/time-clocks.pdf

Introduction

Distributed systems are powerful and they provide solutions to complex problems by clubbing multiple computers together. One of the biggest problems in distributed systems which we gain over a single machine system is establishing a correct timeline of events.

This is needed for understanding the ordering and causality of events within system. We need this to do multiple operations. Let us say there are two operations received INSERT and UPDATE in a distributed database. Unless we have a correct order, the operation will throw the error because you can’t update something which hasn’t been inserted yet.

There are two ways to record time in distributed systems -

  1. Clock Synchronisation (Physical Clocks)

  2. Logical Clocks

Designing Distributed Systems is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


Clock Synchronisation

Physical Clocks are tied to the notion of real time. It can be used to order events, find time difference between two events, etc.

In this system, each node has a local clock used by it to timestamp events at the node. The local clocks of different nodes may vary but they need to be synchronized. Read about Clock Synchronisation problem is you are more interested in this topic. A common approach for pair-wise synchronisation between computers is through the use of the client-server model i.e let clients contact a time server, i.e Network Transfer Protocol. Berkeley’s algorithm is another way to solve this problem.

We won’t go into too much detail in this section for this article.

Share

Logical Clocks

An approximation of time based on Physical Clocks may be good enough if events infrequently. However, in most distributed systems this is not a good enough guarantee. Therefore, we have to look to a virtual way of expressing time between machines, so we can keep the ability to place events into an accurate timeline. This is the notion of Logical Clocks.

Logical Clocks refer to implementing a protocol on all machines within your distributed system, so that the machines are able to maintain consistent ordering of events within some virtual timespan.

Lamport’s Clocks

A breakthrough paper Time, Clocks, and the Ordering of Events in a Distributed System, Leslie Lamport defined the “happens-before” relationship “→” as follows:

  1. If a and b are events in the same process, and a comes before b, then a→b.

  2. If a is the sending of a message by one process and b is the receipt of the same message by another process, then a→b.

  3. If a→b and b→c, then a→c.

If events x, y occur in different processes that do not exchange messages, neither x → y nor y → x is true, x and y are said to be concurrent.

A Limitation of Lamport Clock(or Scalar Logical Clock) is the following - x → y implies C(x) < C(y) but C(x) < C(y) doesn’t imply x → y!

Also, In order to become causally consistent we need a way of representing local time and global time separately. This is where Vector Clocks come in.

Vector Clocks

Vector Clocks, a more advanced version of Lamport’s Logical Clock. They are designed to overcome scalar clock problem.

For each process Pᵢ of the system, the algorithm maintains a vector VCᵢ with the following attributes:

  • Cᵢ[j]: local logical clock at Pᵢ, or the number of events that have occurred before the current timestamp.

  • If Cᵢ[j] = k, Pᵢ knows that k events have occurred at Pj.

The algorithm of Vector Clocks then goes as follows:

  1. Before executing an event, Pᵢ records a new event happens at itself by executing Cᵢ[i] <- Cᵢ[i] + 1.

  2. When Pᵢ sends a message m to Pj, it sets timestamp t(m) equal to Cᵢ after having executed the previous step.

  3. When message m is received, process Pj update each k of its own vector clock: Cj [k] ← max { Cj [k], t(m)[k]}. Then it continues executing the first step and delivers the message to the application.

By these steps, when a process Pj receives a message m from process Pᵢ with timestamp t(m), it knows the number of events that have occurred at Pᵢ casually preceded the sending of m. Furthermore, Pj also knows about the events that have been known to Pᵢ about other processes before sending m.

Problems of Vector Clock 

  • Message size increases since each message needs to be tagged with the vector 

  • Size can be reduced in some cases by only sending values that have changed

  • Can also send only a scaler to keep track of direct dependencies only, with indirect dependencies computed when needed  Tradeoff between message size and time

Share Designing Distributed Systems


When it comes to the concept of time in distributed system, the primary goal is to achieve the correct order of the events. It can be done either in chronological order with Physical Clocks or in logical order with Lamport’s Logical Clocks and Vector Clocks along the execution timeline.

References -

  • https://lamport.azurewebsites.net/pubs/time-clocks.pdf

  • https://levelup.gitconnected.com/distributed-systems-physical-logical-and-vector-clocks-7ca989f5f780

  • https://medium.com/distributed-knowledge/time-synchronization-in-distributed-systems-a21808928bc8

  • https://profile.iiita.ac.in/bibhas.ghoshal/lecture_slides/distributed/Clock.pdf

Please feel free to reach out to me if you need help with building large scale distributed systems, deep learning or data engineering solutions at your organisation, I would be more than happy to help.

Read more of my work -

Designing Distributed Systems
Gentle Introduction to Consensus
Happy Sunday! This blog introduces the concept of consensus and replicated state machines. This is first post of a series of posts around consensus. Introduction A replicated state machine is deterministic. It has to be replicated across m…
Read more
2 months ago · 2 likes · Vipul Vaibhaw

Also check out my Youtube Channel -

I hope that you enjoyed this read, if so then please share this article with your friends, let’s build a solid community. I will be back soon with another well thought/researched article delivered straight in your inbox. See you!

You can connect with me on Twitter and LinkedIn.

Share this post

Understanding Causality and Order: Distributed Systems

distributeddataengines.substack.com
Comments
TopNew

No posts

Ready for more?

© 2023 Vipul Vaibhaw
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing