Mergeable Replicated Data Types

This topic is about the design of an execution & query model for Mergeable Replicated Data Types in a key-value store, as an additional model for mutable data in the system.

Design overview

Mutable data is an important building block for various parts of the system.
A key-value store of replicated data types would be an appropriate data model for e.g. various data structures for the network subsystem, such as the membership set of domains & topics, records associated with an identity in the distributed identity & name system, as well as applications such as Multi-Chat and storing user data that needs to be synced across multiple devices, etc.

Two main approaches to RDTs exist:

I propose Certified MRDTs as the research direction since they are simpler and more efficient, as execution requires less causal context, garbage collection is simpler, and have a more flexible programming model since it can be used with arbitrary data types and supports composition.

Extending our execution system with a new execution model for MRDTs enables a flexible and extensible programming model on replicated data types that allows application developers to define custom data types with custom merge functions according to applications needs.

MRDTs allow efficient local execution of transactions and local queries in many cases.
When the KV store data needs to be queried by nodes that do not execute the transactions themselves, authenticating the KV store state becomes necessary, by e.g. signatures by a quorum of validators periodically.

Research concerns/areas/questions:

  • Storing causal history
    The causal history is already provided by pub/sub topics and the mempool DAG that store a causal history of cryptographically authenticated operations in a DAG.
    Since transactions rely on a causal order, consensus on total order is not required.
  • Data model & data type definition
    • key-value store with MRDTs
    • data type definitions
      • purely functional data types
      • define/derive merge functions for types
      • OCaml implementation exists that can run as a MirageOS unikernel on the solo5 VM as a sanboxed linux process
      • Juvix implementation would be possible that integrates better with the rest of the system
  • Executing transactions on MRDTs
    • execute merge functions and read/write KV store
    • permission system that allows specifying what operations can be performed on what key by what identities
  • Querying MRDTs
  • Authenticated state & values
    • cryptographically authenticate a state snapshot and possibly individual values
2 Likes

Do I understand correctly (from skimming the papers) that MRDTs/CMRDTs require simply the ability to determine what they call the “lowest common ancestor” (which I understand to mean most recent common ancestor, given a causal history DAG), and this along with the two concurrent versions is sufficient to perform the merge, where the data structure and merge function must be constructed such that eventual consistency is guaranteed?

This is mostly a terminological note, but we should call these “MRDT transactions” or something to distinguish them from the RM transactions, which do require consensus on a total order.

What is this referring to – an implementation of what? Some MRDTs? Do you have a link?

Should/would this itself be a (certified) MRDT? Or is this outside the MRDT system (and if so, does making permission changes require a total order)?

As a note, I think we should try to integrate the causal-order system and the total-order system by periodically notarizing any relevant MRDT states to controllers. If we do this frequently and subsequent messages are required to reference it, this should guarantee a most recent common ancestor of bounded recency, which ensures that the MRDT states don’t get “too far away” (not in terms of being able to merge them, but in terms of user/application experience), and also that the amount of compute/bandwidth needed to compute the latest updates should be bounded (if you have a recent controller-attested snapshot). This is just an idea, but I think something like this should work.

Can you expand upon what you mean by “cryptographically authenticate” here? Are you thinking of zero-knowledge proofs of correct MRDT computation over a history of messages? Signatures? Something else?

Yes, that is the idea.