What’s happening in this episode?
I ended the last episode with a bit of a cliffhanger (again!).
We reviewed two different CRDTs (GCounters and GSets) and learned how these two are actually the same structure underneath: a Map
with two different value types.
But an important problem hasn’t been solved yet: how to delete values.
The difficulty arises because – implemented naively – deletion breaks monotonicity.
Consider a GSet {1, 2, 3}.
If you delete the 1, you arrive at {2, 3}.
But the previous set {1, 2, 3} is not a subset of {2, 3}!
Later, when merging, we don’t know if 1 has been deleted or was never there to begin with, as can be seen in the following transcript:
 Alice and Bob both start with the set {2, 3}.
 Internet connection fails.
 Alice adds 1 to the set.
 Bob adds 1 to the set.
 Alice deletes 1 from the set.
 Internet connection is restored.
Is the result {1, 2, 3} or {2, 3}? The naive implementation says the former because it’ll just take the union of both sets. The way to avoid this is by using tombstones.
The Ghost of Values Past
The idea behind tombstones is quickly explained: you never actually delete values, just mark them as deleted.
Alright, that’s it for today, stay tuned for the next episode.
I’m kidding. Of course, the idea is simple, but there’s a lot of stuff we still need to figure out. For example, how do we represent the deleted values? What will the programming interface be like?
Let’s start with the most basic example, the TwoPhaseSet (or 2PSet). The intended semantics is remove bans, i.e., a value that has been removed is banned from ever being added again. This is why it’s called “Two Phase”: a value travels through the two phases of being a member of the set and having been a member of the set. Applied to the above transaction between Alice and Bob, the result would’ve been {2, 3}, because Alice’ deletion takes priority over Bob’s addition. Note that time does not play a role here; even if steps 4 and 5 were swapped, the result would still be {2, 3}.
Traditionally, 2PSets are implemented as a pair of GSets called A and R (for “Added” and “Removed”). Adding an element x adds it to A. Removing an element x adds it to R (assuming it is also in A, so that you can’t remove an element that was never in the 2PSet). In other words, the set R must always be a subset of the set A. The actual set of currentlypresent elements are all elements of A that are not in R (the set difference, in mathematical terms).
Let’s apply this to the exchange from earlier. In the following table, the numbering is the same as above, but without the connection stuff, so steps 2 and 6 don’t really exist here:
Step  Actress  Action  Alice’s A  Alice’s R  Bob’s A  Bob’s R 

1  (start)  {2, 3}  {}  {2, 3}  {}  
3  Alice  add 1  {1, 2, 3}  {}  {2, 3}  {} 
4  Bob  add 1  {1, 2, 3}  {}  {1, 2, 3}  {} 
5  Alice  delete 1  {1, 2, 3}  {1}  {1, 2, 3}  {} 
Merging a 2PSet works by merging the two constituent GSets individually. In our example, this results in A = {1, 2, 3} and R = {1}. The difference between those sets is {2, 3}, meaning, 1 has been deleted and 2 and 3 remain in the set.
This implementation is simple enough, right? A possible optimization is that deleting elements could also delete the element from A, instead of just adding it to R. Then, the table can be changed as follows:
Step  Actress  Action  Alice’s A  Alice’s R  Bob’s A  Bob’s R 

1  (start)  {2, 3}  {}  {2, 3}  {}  
3  Alice  add 1  {1, 2, 3}  {}  {2, 3}  {} 
4  Bob  add 1  {1, 2, 3}  {}  {1, 2, 3}  {} 
5  Alice  delete 1  {2, 3}  {1}  {1, 2, 3}  {} 
Then, we wouldn’t have to look at R when figuring out what elements are in the 2PSet currently. On the other hand, A wouldn’t be a GSet anymore, complicating the merge algorithm (but it’s still possible to implement it).
The set R contains a “memory” of old values that will never again appear in the 2PSet. This is why it is sometimes called the tombstone set.
2PSets are also Maps
In the last episode, I’ve told you that GSets are special cases of maps. Notably, maps where the value type is a lattice, and where update operations need to be monotonic.
Turns out, this can also be used to model 2PSets. How? Well, first observe that in the nonoptimized version I’ve introduced above, a value x can have three different states:
 it is neither in A nor in R
 it is in A but not in R
 it is in both in A and in R
Now we want to model both A and R as a single map S that describes each element’s state. The idea is that the elements of A and R are keys in the map and the corresponding values are booleans. Each possible state of an element x can be represented as follows:
 x is not defined in the map
 x is defined in the map and has the value
false
(read: “not yet deleted”)  x is defined in the map and has the value
true
(read: “yes now this thing has actually been deleted”)
Let’s reconsider the transaction from above, but with just this single map S:
Step  Actress  Action  Alice’s S  Bob’s S 

1  (start)  {2 → false , 3 → false } 
{2 → false , 3 → false } 

3  Alice  add 1  {1 → false , 2 → false , 3 → false } 
{2 → false , 3 → false } 
4  Bob  add 1  {1 → false , 2 → false , 3 → false } 
{1 → false , 2 → false , 3 → false } 
5  Alice  delete 1  {1 → true , 2 → false , 3 → false } 
{1 → false , 2 → false , 3 → false } 
Now, how does merging work?
In exactly the same way as before!
Alice and Bob compare their two maps.
If either of them has any key that’s lacking in the other’s map, it just gets added unchanged (we don’t have that in our example scenario).
If they have a common key but disagree on its value, they have to merge the value.
This is where things get important: true
has priority over false
.
In other words: the partial ordering for boolean is defined in a way that false
≤ true
(as you would expect in JavaScript).
This also implies that when updating the map, it is only allowed to transition a false
to a true
(as you would expect from the semantics of a 2PSet).
This guarantees that an element of the 2PSet can never come back from the dead.
Adding a new key is always allowed, although the concrete implementation would probably want to make sure that you can’t directly add a true
element since that would be kinda pointless (not harmful though, just pointless: why would you want to delete an element that has never existed?).
But the key takeaway is that even deletion fits neatly into our latticepartialordering framework. That’s a relief, right? All that work finally paying off.
No Free Lunch
We’ve now learned that we can get the following CRDTs “for free”:
 GSet
 GCounter
 2PSet
In case you were wondering at what point I’m going to show the code, the answer is “not in this episode because there’s no new code to show”. All the pieces have been implemented already, and all the contracts have been checked.
And yet. There’s no free lunch.
While the underlying algebraic structures can neatly be composed to yield larger structures, an actual realworld CRDT library would not expose those directly to the user. Recall the sentence from earlier:
Notably, maps where the value type is a lattice, and where update operations need to be monotonic.
This is quite a strong requirement.
Let’s say you have a Map<string, number>
in JavaScript.
Nobody prevents you from decrementing the numbers in the map, or deleting entries altogether.
This is fine when using any old Map
.
But we’re not dealing with any old maps, we’re superimposing the lattice semantics on them.
Trying to merge maps that have been updated nonmonotonically breaks our entire construction, and what’s worse, we wouldn’t even notice!
Instead, we’ll have to wrap maps somehow to enforce monotonic updates.
In JavaScript, we can make a design decision whether to create a new class or to create a Proxy
that intercepts calls to an underlying map (so that we can use the map as a regular Map
).
For demo purposes, I’ll sketch the first option below.
class MonotonicMap {
constructor(partialOrdering, entries) {
this.map = new Map(entries);
this.partialOrdering = partialOrdering;
}
get(key) {
return this.map.get(key);
}
has(key) {
return this.map.has(key);
}
set(key, value) {
if (this.has(key)) {
const oldValue = this.get(key);
if (!this.partialOrdering.isLeq(oldValue, value))
throw new Error(`Nonmonotonic update for ${key}`);
}
this.map.set(key, value);
}
}
const mmap = new MonotonicMap(orderings.any, [["alice", 1], ["bob", 0]]);
mmap.set("bob", 1); // ok
assert.throws(() => mmap.set("alice", 0), /monotonic/); // not ok
mmap
This map is generic in that it works for any partial ordering and prevents nonmonotonic updates.
Consequently, it doesn’t even offer a delete
operation.
We could also implement a merge
operation directly on MonotonicMap
, but we need to take care to somehow handle merging two maps that have different orderings attached (this is difficult in many programming languages, including JavaScript; see here for a possible approach in Scala).
In the end, this will greatly depend on your domain and how far you are willing to go for robustness, i.e., ruling out nonsensical operations.
Encapsulating state
Arguably, the MonotonicMap
implementation is still not quite useful for application developers.
But we can use it as a foundation to implement e.g. a 2PSet, whose delete(key)
method performs a set(key, true)
on the underlying map.
That way, you could provide an interface that makes sense from a domain point of view that delegates to an implementation that makes sense from an algebraic point of view.
Luckily, we don’t have to invent this programming pattern, since it has already been described in the 1970s as Abstract Data Types.
We could, for example, describe the contract of a 2PSet as follows:
 Initial State
 A = {}, R = {}
 Invariant
 R is subset of A
 Operation elements
 return A  R
 Operation add(e)
 set A := A + {e}
 Operation remove(e)
 check that e is in A, then set R := R + {e}, otherwise fail
In other words, the above describes the interface of a 2PSet, whereas the MonotonicMap
with an appropriate lattice describes the implementation.
If you’re curious about a more formal treatment of this, check out this side note. Otherwise, feel free to skip it.
What’s next?
We’ve seen how CRDTs can cope with deletion of values. But so far, this has been really restricted: once a value is out, it’s out. There are two different ways to make this a bit more flexible. One of them requires a notion of time. So, the next episode will talk about time and causality in distributed systems.
References
 Stone Church, Hamilton, Canada by Scott Rodgerson on Unsplash
 Marx and Engels on Wikimedia Commons
Thanks to the people who've read drafts of this series and provided valuable feedback: Andrea, Clement Delafargue, Heiko Seeberger, Hillel Wayne, Johannes Link, Matthew Weidner, Princess.