Stored data format: resource machine <> storage

I want to note down the results of our discussion as I understand them in writing here to check if everyone is on the same page. Here is my understanding:

Storage types

Currently, we have four distinct types of storage:

Name Abstract interface Concrete structure Key type Value type
Commitment tree Cryptographic accumulator with membership proofs (see survey) Merkle tree (details tbd, @Moonchild was working on this) Merkle paths Field elements
Nullifier set Cryptographic accumulator with non-membership proofs Merkle tree (details tbd) Merkle paths Field elements
Hierarchical index Tree (see below) Chained HashSets Tree paths Field elements (keys in the blob store)
Blob store Key-value store with deletion criteria Key-value store with deletion criteria Field elements (variable-length byte array, deletion criterion) tuples

Deletion criteria

Deletion criteria are in principle arbitrary predicates. In practice, for now, we will only allow the following specific options (which are efficiently computable):

  • delete after block #n
  • delete after timestamp #t
  • delete after signature from identity #i over data #d
  • delete after either predicate #p1 or predicate #p2 (e.g. delete after block #n OR signature)
  • store forever

In particular, “delete after signature” should allow for a kind of programmable data availability handoff, where a transaction can request that validators store the results immediately, then delete them as soon as a selected third-party has stored them and signed to attest to this.

Hierarchical index

The hierarchical index, conceptually, is a tree:

The Tree interface provides the ability to:

  • efficiently query all children of a particular path (a single node, if the path is full)
    e.g. pets/canine/husky in the above example, returning Huskers,Catky
  • prove the aforementioned query

The tree is constructed from resource sub-kinds:

  • Let R be a resource
  • Let R_{label} be interpreted as an array [l_1, l_2, l_3, ...].
  • Let R_{subkind\ k}, the sub-kind k of R, be hash(R_{logic}, l_i).

Only the leaves of the tree refer to actual resources; all of the intermediary nodes are constructed from these sub-kinds. In the example above, say, Huskers is a resource with label ["pets","canine","husky","huskers"].

Permissions for writing to this hierarchical index are then enforced by the resource logics, so no additional permissioning structure is necessary.

Resource machine interface

The transaction type includes the following storage-related instructions to the execution engine:

  • commitments to add to the commitment tree
  • nullifiers to add to the nullifier set
  • paths to add and remove from the hierarchical index (which will correspond to resources in the transaction)
  • blobs and associated deletion criteria to add to the blob store

The execution engine then acts on these instructions.

Garbage collection

The blob store is automatically garbage-collected at a configurable epoch length based on indices for:

  • timestamp-based expiry
  • block height-based expiry
  • signature-based expiry
    • we may need to have these signatures submitted in transactions, but hopefully these transactions can be free (only ones which delete a currently stored blob are accepted, which should be sufficient anti-DoS?)

@isheff I recall that you had some points about garbage collection of the hierarchical index and blob store in a related way, but I don’t remember the plan well enough to write it down. Would you be able to?

1 Like