I want to note down the results of our discussion as I understand them in writing here to check if everyone is on the same page. Here is my understanding:
Storage types
Currently, we have four distinct types of storage:
Name | Abstract interface | Concrete structure | Key type | Value type |
---|---|---|---|---|
Commitment tree | Cryptographic accumulator with membership proofs (see survey) | Merkle tree (details tbd, @Moonchild was working on this) | Merkle paths | Field elements |
Nullifier set | Cryptographic accumulator with non-membership proofs | Merkle tree (details tbd) | Merkle paths | Field elements |
Hierarchical index | Tree (see below) | Chained HashSets | Tree paths | Field elements (keys in the blob store) |
Blob store | Key-value store with deletion criteria | Key-value store with deletion criteria | Field elements | (variable-length byte array, deletion criterion) tuples |
Deletion criteria
Deletion criteria are in principle arbitrary predicates. In practice, for now, we will only allow the following specific options (which are efficiently computable):
- delete after block #n
- delete after timestamp #t
- delete after signature from identity #i over data #d
- delete after either predicate #p1 or predicate #p2 (e.g. delete after block #n OR signature)
- store forever
In particular, “delete after signature” should allow for a kind of programmable data availability handoff, where a transaction can request that validators store the results immediately, then delete them as soon as a selected third-party has stored them and signed to attest to this.
Hierarchical index
The hierarchical index, conceptually, is a tree:
The Tree interface provides the ability to:
- efficiently query all children of a particular path (a single node, if the path is full)
e.g.pets/canine/husky
in the above example, returningHuskers,Catky
- prove the aforementioned query
The tree is constructed from resource sub-kinds:
- Let R be a resource
- Let R_{label} be interpreted as an array [l_1, l_2, l_3, ...].
- Let R_{subkind\ k}, the sub-kind k of R, be hash(R_{logic}, l_i).
Only the leaves of the tree refer to actual resources; all of the intermediary nodes are constructed from these sub-kinds. In the example above, say, Huskers
is a resource with label ["pets","canine","husky","huskers"].
Permissions for writing to this hierarchical index are then enforced by the resource logics, so no additional permissioning structure is necessary.
Resource machine interface
The transaction type includes the following storage-related instructions to the execution engine:
- commitments to add to the commitment tree
- nullifiers to add to the nullifier set
- paths to add and remove from the hierarchical index (which will correspond to resources in the transaction)
- blobs and associated deletion criteria to add to the blob store
The execution engine then acts on these instructions.
Garbage collection
The blob store is automatically garbage-collected at a configurable epoch length based on indices for:
- timestamp-based expiry
- block height-based expiry
- signature-based expiry
- we may need to have these signatures submitted in transactions, but hopefully these transactions can be free (only ones which delete a currently stored blob are accepted, which should be sufficient anti-DoS?)
@isheff I recall that you had some points about garbage collection of the hierarchical index and blob store in a related way, but I don’t remember the plan well enough to write it down. Would you be able to?