Stored data format: resource machine <> storage

  • Resource machine: defines the state to be stored and the security requirements
  • Storage: cares about how to store the state

The stored state is represented as a key-value store. The goal of this discussion is to specify the structure of this key-value store: what would be the keys and what would be the values for each state component recognized by the resource machine: CMtree, NFset, data blobs (think encrypted resources)?

A B key value
CMtree leaf timestamp + cm 1
node prefix of the timestamp node value (the hash of the child nodes)
NFset nf 1
Mutable indices kind hash
Data blobs hash blob

Timestamps
Associate a timestamp with each tree node s.t. lower depth tree node corresponds to a less specified timestamp: a parent node timestamp is a prefix of the child node timestamp. The leafs of the tree would have fully specified timestamps: t1. * . * - t1.t2. * - t1.t2.t3

Questions

  • What to use as a key for data blobs?
    • What would be the implications of using resource kind as a key for data blobs?
    • If we use a resource kind for data blob keys, can we organise the resource kinds in any kind of hierarchy?
    • What would be the security implications of using resource kinds as data blob keys?
    • Note: the value stored under the key in the case we use the resource kind as a key would be determined by the resource kind
  • Do we want to determine the position of the next commitment added to the tree sequentially? It would allow to conveniently forget about some parts of the tree after a certain points, and whoever has unconsumed resources in that part of the tree would only need to store the sub-tree path (and fetch the rest). What are the security implications?
  • Do we want to associate an expiration date with each resource/nullifier and stop storing the data once the resource has expired?
1 Like

Discussion of resource sub-kinds and tree indexing:

I want to note down the results of our discussion as I understand them in writing here to check if everyone is on the same page. Here is my understanding:

Storage types

Currently, we have four distinct types of storage:

Name Abstract interface Concrete structure Key type Value type
Commitment tree Cryptographic accumulator with membership proofs (see survey) Merkle tree (details tbd, @Moonchild was working on this) Merkle paths Field elements
Nullifier set Cryptographic accumulator with non-membership proofs Merkle tree (details tbd) Merkle paths Field elements
Hierarchical index Tree (see below) Chained HashSets Tree paths Field elements (keys in the blob store)
Blob store Key-value store with deletion criteria Key-value store with deletion criteria Field elements (variable-length byte array, deletion criterion) tuples

Deletion criteria

Deletion criteria are in principle arbitrary predicates. In practice, for now, we will only allow the following specific options (which are efficiently computable):

  • delete after block #n
  • delete after timestamp #t
  • delete after signature from identity #i over data #d
  • delete after either predicate #p1 or predicate #p2 (e.g. delete after block #n OR signature)
  • store forever

In particular, “delete after signature” should allow for a kind of programmable data availability handoff, where a transaction can request that validators store the results immediately, then delete them as soon as a selected third-party has stored them and signed to attest to this.

Hierarchical index

The hierarchical index, conceptually, is a tree:

The Tree interface provides the ability to:

  • efficiently query all children of a particular path (a single node, if the path is full)
    e.g. pets/canine/husky in the above example, returning Huskers,Catky
  • prove the aforementioned query

The tree is constructed from resource sub-kinds:

  • Let R be a resource
  • Let R_{label} be interpreted as an array [l_1, l_2, l_3, ...].
  • Let R_{subkind\ k}, the sub-kind k of R, be hash(R_{logic}, l_i).

Only the leaves of the tree refer to actual resources; all of the intermediary nodes are constructed from these sub-kinds. In the example above, say, Huskers is a resource with label ["pets","canine","husky","huskers"].

Permissions for writing to this hierarchical index are then enforced by the resource logics, so no additional permissioning structure is necessary.

Resource machine interface

The transaction type includes the following storage-related instructions to the execution engine:

  • commitments to add to the commitment tree
  • nullifiers to add to the nullifier set
  • paths to add and remove from the hierarchical index (which will correspond to resources in the transaction)
  • blobs and associated deletion criteria to add to the blob store

The execution engine then acts on these instructions.

Garbage collection

The blob store is automatically garbage-collected at a configurable epoch length based on indices for:

  • timestamp-based expiry
  • block height-based expiry
  • signature-based expiry
    • we may need to have these signatures submitted in transactions, but hopefully these transactions can be free (only ones which delete a currently stored blob are accepted, which should be sufficient anti-DoS?)

@isheff I recall that you had some points about garbage collection of the hierarchical index and blob store in a related way, but I don’t remember the plan well enough to write it down. Would you be able to?

1 Like

I’d note that “delete after signature” blobs should probably not be read during post-ordering execution: we need post-ordering execution to be deterministic, and produce the same results on all replicas. I only some replicas have seen the signature and others have not, then their executions will produce different results. The alternative here would be that all such signatures have to be logged in replicated state machine: essentially each “store until you get a signature” blob can be deleted by a transaction featuring that signature.

I’m not sure exactly what @cwgoes is referring to concerning garbage collection of the hierarchical index. I do, however, have thoughts about garbage collection of commitments and nullifiers:

  • If, in some sense, we fill in the commitment tree “from left to right,” we can garbage collect old sub-trees periodically, and require that people who want to prove those commitments have to remember their proofs.
  • Garbage collecting nullifiers is harder. We could, however, give each resource an expiration date (so you’re not allowed to nullify the resource after this date), and then store each nullifier with an expiration date proven to be later than the relevant resource’s expiration date. Then we can delete nullifiers after their expiration date has passed, and be sure the corresponding resource won’t be nullified again thereafter.
2 Likes

We may want this an option, actually. Often times users do just want the validators to temporarily store the data (to ensure that someone does) and don’t need to access it again in the state machine. Sometimes they might need to access it again in the state machine. Perhaps we can store this choice in a flag, and charge higher gas for storing blobs in a way which can be accessed during transaction execution.

I don’t recall the exact plan, but I remember that we discussed something about keeping the index and data blob store in sync, so that we avoid “dangling pointers”. Does that ring any bells?

This is an option. Another more lightweight (complementary) form is to split nullifier sub-sets based on expiration epochs, and stop storing old sets after awhile, but keep some Merkle root (so that users or groups of users elsewhere could still store old nullifiers if they want to, and thereby make the proofs).

Right, so there are 2 things we want to avoid with the blob store / index split:

  • an index refers to a blob that has expired, but some validators have deleted it, and some have not, so a post-ordering execution that involves looking up the index and fetching the blob has non-deterministic effects. We can avoid this by enforcing that deletions “happen” at a specific time.
  • an index refers to a blob that has been deleted, so now we’ve just got a hash in the index that doesn’t look up anything, and no clear rules on when (if ever) we should remove that index element. We can avoid this by having blobs track incoming index pointers (yay reference counting!) and then remove those index pointers when the blob is deleted.

The point of nullifiers, as I understand it, is that we check (after ordering) that some specific nullifiers don’t exist. Merkle roots don’t help for this, and the ability to generate new proofs that nullifiers do exist doesn’t really help either.

1 Like

In this case the user would make a proof that a specific nullifier doesn’t exist in a particular set with a particular Merkle root, the chain can check that the Merkle root matches the one it has stored, and update its stored Merkle root to the new root for the new set with the now-revealed nullifier added.

which would require that any users interested in doing this update their stored Merkle trees in sync with the chain, or their proofs would be about old (and therefore irrelevant) roots.

Yes, that’s right, but depending on how we structure the old nullifier set Merkle tree (perhaps hierarchically in some way) users may not need to store/sync all of it. It would also likely make sense in this case to do some form of transaction (and proof) aggregation in alignment with where the old nullifier set is actually stored.

I need to finish my commitment tree design space exploration post, but consider the following slice of design space: use a fixed height-32 tree, and logically divide it into a top half and a bottom half. The top half needs O(2^16) space; it’s no big deal for everyone to store it forever and send it to whomever. Every leaf of the top half corresponds to a subtree with 2^16 commitments; as commitments are added, the hash of that node will change as commitments are added to its subtree, until the 2^16th one is added, at which point it will never change again. As a client, once I get a new commitment, I just need to stay online and listen to the next <2^16 commitments (not very many!) before my subtree fills up and I can make a path to a now-immutable leaf node in the top half of the tree. At this point, I can drop off the map indefinitely, then come back online, request the entire top half of the tree (also not very much!—and I don’t leak any information) and use that to build a path to a current root.

There is the question of what happens if I go offline before I hear all <2^16 commitments in my subtree. We should expect that somebody will keep around the entirety of the tree forever (e.g. to do analysis on it). If nothing else, then I can pay to have somebody transfer the whole tree to me. In this case, can trade off privacy with transfer size/time by potentially requesting only a part of the tree rather than the whole thing. But the usual case is fairly low transfer costs on order of sqrt(n).

(Edit: I was told there may be privacy problems with always proving against a recent root; can do a cheap exponential backoff thingy that avoids that and still bounds space/transfer pretty tightly.)

I think that basic idea works, although I’m not sure 2^16 is the optimal constant.

To clarify: when I add a new commitment, where do I put it? Sequentially after the previous (meaning we fill the leaves “left-to-right”)? Ideally, we’d be able to add commitments concurrently, but doing so in batches (fixed size or time interval) is probably fine.

For a slightly more general version:

The first “N” bits (say, 16) of the key for a commitment are a timestamp. (I don’t know how we specify the remaining bits, if any, for the commitment key). At any given time, any portion of the commitment Merkle tree to the “left” of the current timestamp is “old” and will no longer change. Validators therefore only have to remember the top (say, minute) levels of old parts of the tree, anyone who wants to “prove” an old commitment needs to get a merkle proof for sometime after the next minute, and they can be sure their proof will be useable forever. We could even have validators forget levels of the tree based on how old they are (they remember every minute’s root for at least a year, then remember every hour’s root for at least a decade, then remember every day’s root for at least a century, etc.), so people who want to prove their commitments a long way into the future might have to wait longer to get a proof useable further in the future.

(Note: we don’t really want to use minutes, hours, days, etc. We probably want powers of 2 numbers of milliseconds or something like that)

General question: how many commitments are we expecting each chain to get per second? If, for any reason, they’re keeping track of commitments from other chains, include those in the count. I ask because, if we’re storing 1000 commitments per second, then we’ll fill a 32-bit merkle tree in 2 months, so we should be talking about some bigger constants.

Can you explain more what you find problematic about adding commitments sequentially? I don’t see why it inhibits concurrency, since we already have to order transactions.

The idea of using real time is intriguing. It has the drawback of being inefficient when activity is low but, corrolarily, just using commitment count means that when activity is low you don’t know how long you have to wait to ‘lock in’ your commitment. So maybe go for a hybrid approach?

A minute isn’t very much time—I was thinking something closer to a week would be more efficient and not prohibitive for clients.

On the topic of trees filling up—I suggested variable height trees, which would imply a definition of ‘old’ that varies with tree size. (Either constant #high bits/variable #low bits, or variable high/constant low, or variable/variable.) Unless associative hashing turns out to be secure and zk-viable (which is very plausible but currently not clear), that entails variable size proofs, which @xuyang and @vveiln tell me is ‘annoying’. They suggested to use a fixed-size (say, height 32) tree, and when that fills up, make another. You have to reveal your choice of tree when making a commitment proof, which leaks some information, but this can be ameliorated by revealing multiple trees and secretly choosing one of them.

We could even have validators forget levels of the tree based on how old they are (they remember every minute’s root for at least a year, then remember every hour’s root for at least a decade, then remember every day’s root for at least a century, etc.), so people who want to prove their commitments a long way into the future might have to wait longer to get a proof useable further in the future.

I like this. I think this is basically equivalent to remembering a constant number of high bits in a variable-height tree. If you also keep the entire left fringe of the tree, then if you wait an amount of time at most proportional to the current age of the tree then you get a proof that works forever (this is basically what I was referring to wrt exponential backoff).

It depends how fast you want to be able to read proofs. We do have a global order, but we want to process transactions concurrently, in a way that is consistent with the global order (preserving serializability). Normally, this would mean that if 2 transaction candidates don’t touch any of the same data, they can be processed completely in parallel. Unfortunately, if we want a Merkle root update available immediately after any transaction candidate has finished, then every transaction candidate that produces commitments (which is probably most of them) would need to update the same piece of state: the merkle root itself.

We can avoid this problem with batching. If you’re willing to wait a little while (say, 1 second) after the transaction candidate has “committed” before the Merkle root reflects the commitments, then we can do batch updates to the commitment merkle tree using all the transactions from the last second. As a bonus, these batch updates involve less total hash computation than doing individual updates sequentially.

The order of the commitments within the batch does not have to reflect the transaction candidate order, but doing so is not a problem.

At one time, I was talking to someone who was under the impression that the 32-bit commitment keys were determined by the commitment itself, and so the commitments could not be added sequentially, but it seems that is not the case.

Is it bad for the tree to be sparse? Why not just have each commitment key be 320 bits, with the first 64 being a timestamp, and the last 256 being the commitment hash itself. We could remember old time roots at whatever granularity (and for however long) we want (perhaps each chain would have its own policy), and we’d never run out of space.

Would it help to use some kind of merkle-ized balanced lookup tree structure? I vaguely recall reading about such a thing here.

All of this makes sense to me. But it doesn’t seem to be in conflict with having the nodes be added to the tree sequentially. I think we might be just be using different words for the same things and the same words for different things? What I mean is that all the occupied nodes in the tree at any given point in time should be contiguous and left-aligned. When we go from anchor m to m+1, we add n new leaves—those should be contiguous, and go in the leftmost n free spots; but there’s no need to consider any intermediate states between m and m+1—we don’t have to make a decision about where any of the commitments lives before we decide to make m+1. (And we can still speculatively execute the transactions that create those n commitments, leaving the commitments in limbo until m+1 is created!)

I am under the impression that there is no need for the commitment tree to be content-addressed. If it is to be content addressed, then 32 isn’t nearly enough bits.

A very tall tree bloats proofs (unless the associative hash thing works out…), which doesn’t seem nice, so better to avoid it if possible.

I think that’s only necessary if the tree is sparse. If it’s dense, and nodes are added sequentially, then nothing special is required to keep it balanced.

With the balanced structure, we can have a content-addressed tree—say, with 256-bit keys—and a smaller maximum capacity—of, say, 2^64—and have proofs of size proportional to 64 rather than 256. I don’t understand what the advantage of that would be though.

What I mean by ‘inefficient’ is: there are some nodes that everybody has to remember forever, and some that most people will forget about; ideally, for every node in the former category, there are a lot in the latter category. The lower the ratio, the greater the efficiency, because it’s a smaller proportion of the overall tree that we have to remember.

Why do we want to store nullifiers in a Merkle tree?

Because we want the ability for someone to make a proof of non-inclusion with respect to a particular nullifier set commitment (= Merkle tree root, if we use a Merkle tree) that someone who stores only the commitment (but not all the nullifiers) can check. Perhaps there’s a more efficient way of doing that, in this case, since we don’t need proofs of inclusion - wdyt?

As far as I know to have non-membership proofs with a Merkle tree, the leaves have to be sorted. I can see a way to have the nullifiers sorted if we have the tree size to be equal to the nullifier size (256 bit) . Am I missing anything here?

I’m also wondering if it could be useful in some contexts to have inclusion proofs for nullifiersl, like if some logic requires it

Got a bit more questions about the other stuff:

  1. Writing to the hierarchical index is controlled by the resource logics. Is it a different mechanism from resource creation permissions or is it implied that ‘creating a resource is permitted’ = ‘writing to the storage is permitted’. If these two actions are different, how exactly does it work?
  2. All subkinds in the hierarchy are computed using the same logic. Does it mean that subkinds can only differ in labels, not logics? If subkinds can have different logics, how does it work (which logic is used to compute what)? If for this index we don’t consider such hierarchies, does it mean that this index cannot be used to store hierarchies where the logics differ for subkinds?
  3. Data blob storage: What is meant by garbage collection here? Is it different from what is happening after the deletion criteria is satisfied?
2 Likes

It is the same (controlled by resource logics, and the resource machine enforces the usual criteria). In principle the execution engine could be used with a different transaction execution logic which would manage permissions differently than the resource machine does, but we aren’t going to build one.

This is an excellent question, I think my post above leaves this somewhat underdetermined. I think even if we don’t directly support hierarchies with different logics, we can support it indirectly by one level of indirection in the resource logic (which just e.g. keeps an index of pets, and stores some references to different logics that have to be invoked in different cases). For now, I would say keep it simple (per the original description), but we should definitely bookmark this question and probably work through some specific application cases to get a better idea of what the tradeoffs are.

For data blob storage, as far as I know garbage collection is just deletion after the deletion criterion is satisfied (paging @isheff to double-check).

We’ve also been discussing garbage collection in the context of commitments and nullifiers, which is a separate problem.

1 Like