Version Vectors(I)
In my last article, we talked about Vector Clocks. We learned that Vector Clocks could help establish the ordering of operations, including identifying whether operations happened concurrently or were causally related. A very similar mechanism is used by Version Vector, but it’s used for a slightly different purpose.
Version Vector
Version Vectors are generally leveraged in distributed data-driven applications, where each data record is tied to a Version Vector. Since it’s a distributed system, a data record can be updated concurrently by multiple nodes. Thus, we can leverage version vectors, to identify if a data record can be reconciled immediately or if it requires a conflict resolution(remember that we can identify concurrent updates, & if an update to a data record is concurrent, it needs a conflict resolution).
Just like Vector Clocks, Version Vectors also maintain a vector per node, but the entries don’t represent the logical times anymore.
Before going further, let’s specify the criteria for identifying a concurrent update. Similar to Vector clocks, we compare two Version Vectors to understand the relationship between them-
Examples(A-D are actors, while the number represents event count just like in Vector Clocks)-
Now, it’s extremely critical to identify where things are different from Vector Clocks. Always remember that Version Vectors exist for each data item, & hence the comparison of vectors in the above example is the comparison of vectors on different Actors for the same data record/item.
Following are the steps followed by the algorithm -
We’ll talk about two approaches that we can leverage to identify conflicts, & the advantage/pitfalls of both.
Recommended by LinkedIn
Client As An Actor
As we read in the algorithm above, each write is associated with the Actor, and since we’re taking each client to be an Actor, the client responsible for the write will send its identifier along with the write. So, if in case a conflict is detected, a sibling is added to the version vector corresponding to the write from that particular client.
Let’s try to understand what’s happening in the above diagram -
Advantage -
Leveraging Client Id’s to identify conflicts, we could easily handle concurrent updates to the same key.
Concerns -
This brings us to the end of this article. We learned about Version Vectors, where they differ from Vector Clocks. We looked at how Version Vectors can be used by distributed data stores, relying on eventual consistency to identify concurrent updates. We saw the advantages and pitfalls of using Client as an Actor, and in our next article, we’ll look at other approaches to identify concurrent updates and more!
Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this newsletter for notifications of new posts.
Please comment on the post with your feedback, will help me improve! :)
Until next time, Keep asking questions & Keep learning!