Mergeable Replicated Data Types

This topic is about the design of an execution & query model for Mergeable Replicated Data Types in a key-value store, as an additional model for mutable data in the system.

Design overview

Mutable data is an important building block for various parts of the system.
A key-value store of replicated data types would be an appropriate data model for e.g. various data structures for the network subsystem, such as the membership set of domains & topics, records associated with an identity in the distributed identity & name system, as well as applications such as Multi-Chat and storing user data that needs to be synced across multiple devices, etc.

Two main approaches to RDTs exist:

I propose Certified MRDTs as the research direction since they are simpler and more efficient, as execution requires less causal context, garbage collection is simpler, and have a more flexible programming model since it can be used with arbitrary data types and supports composition.

Extending our execution system with a new execution model for MRDTs enables a flexible and extensible programming model on replicated data types that allows application developers to define custom data types with custom merge functions according to applications needs.

MRDTs allow efficient local execution of transactions and local queries in many cases.
When the KV store data needs to be queried by nodes that do not execute the transactions themselves, authenticating the KV store state becomes necessary, by e.g. signatures by a quorum of validators periodically.

Research concerns/areas/questions:

  • Storing causal history
    The causal history is already provided by pub/sub topics and the mempool DAG that store a causal history of cryptographically authenticated operations in a DAG.
    Since transactions rely on a causal order, consensus on total order is not required.
  • Data model & data type definition
    • key-value store with MRDTs
    • data type definitions
      • purely functional data types
      • define/derive merge functions for types
      • OCaml implementation exists that can run as a MirageOS unikernel on the solo5 VM as a sanboxed linux process
      • Juvix implementation would be possible that integrates better with the rest of the system
  • Executing transactions on MRDTs
    • execute merge functions and read/write KV store
    • permission system that allows specifying what operations can be performed on what key by what identities
  • Querying MRDTs
  • Authenticated state & values
    • cryptographically authenticate a state snapshot and possibly individual values
2 Likes

Do I understand correctly (from skimming the papers) that MRDTs/CMRDTs require simply the ability to determine what they call the “lowest common ancestor” (which I understand to mean most recent common ancestor, given a causal history DAG), and this along with the two concurrent versions is sufficient to perform the merge, where the data structure and merge function must be constructed such that eventual consistency is guaranteed?

This is mostly a terminological note, but we should call these “MRDT transactions” or something to distinguish them from the RM transactions, which do require consensus on a total order.

What is this referring to – an implementation of what? Some MRDTs? Do you have a link?

Should/would this itself be a (certified) MRDT? Or is this outside the MRDT system (and if so, does making permission changes require a total order)?

As a note, I think we should try to integrate the causal-order system and the total-order system by periodically notarizing any relevant MRDT states to controllers. If we do this frequently and subsequent messages are required to reference it, this should guarantee a most recent common ancestor of bounded recency, which ensures that the MRDT states don’t get “too far away” (not in terms of being able to merge them, but in terms of user/application experience), and also that the amount of compute/bandwidth needed to compute the latest updates should be bounded (if you have a recent controller-attested snapshot). This is just an idea, but I think something like this should work.

Can you expand upon what you mean by “cryptographically authenticate” here? Are you thinking of zero-knowledge proofs of correct MRDT computation over a history of messages? Signatures? Something else?

Yes, that is the idea.

Right, makes sense.when talking about it in general (however in this topic think it’s sufficiently contextualized)

The CMRDT paper mentions an OCaml implementation of MRDTs, as part of Irmin, but couldn’t find it yet.

One way to do it is to make it an MRDT, another way is to make it part of the Topic Advertisement.

However, in both cases, a permission change would require an epoch change which requires a quorum agreeing on a cut of the DAG that has been seen by the quorum. This is necessary since MRDT execution is often asynchronous, and thus the permission change message may be seen by some and some others not, and even if seen a node might claim it was not, to avoid such issues, thus a quorum and epoch change is necessary.

Yes, this is the idea behind the quorums on state snapshots I mentioned above, which trigger a new epoch on the pub/sub layer and enable garbage collection of transactions from the previous epoch.
The MRDT configuration specifies which nodes are allowed to participate in the quorum and how many nodes are required for it, and the interval it should happen (depends on the topology, usage, and requirements of the MRDT store).
After the quorum nodes with diverging state can sync with nodes in the quorum, and everyone can garbage collect transactions from the previous epoch.
Total order consensus is often not necessary where a lighter weight quorum would suffice instead, however there may be cases where total order consensus is preferred, which allows integrating the two systems

A signature over the snapshot would likely suffice.
Or do you think a ZKP would be useful here?
How would that look like and what properties would it provide?