Clarifying proof structures

This thread is a discussion of optimal design of proof-carrying components of data structures

Context

Actions contain two types of proofs: resource logic proofs and compliance proofs, transaction has just one proof of one type: delta proof. I’ll start the description from the action data structure as it is more nuanced and later get back to transaction.

In the current spec version all action proofs are bundled together in a set, which is suboptimal for verification. Recall that proof verification function is defined as

PS.Verify(PS.VerifyingKey, PS.Instance, PS.Proof) : Bool

With all action proofs in one set, we need to answer two questions to be able to verify the proof:

  1. how do we know which proof corresponds to which statement?
  2. where do we get the corresponding instance?

Proof records

Initially I tried to address the second question by introducing proof records, which are a data structure that has all of the required inputs for the verify function: ProofRecord = (PS, PS.VerifyingKey, PS.Instance, PS.Proof), with these, verification becomes easy:

all(map(lambda pr: verify(pr), action.proofRecords))

However, that was suboptimal because this way we have some data duplicated - all other action components - commitments, nullifiers, and app_data - are parts of the instances too.

Well, we could ditch them all and only keep the proofRecords component, but every time anyone needs a set of commitments/nullifiers, they would have to call some function that assembles the required set from proofRecords, and for applicationData I don’t know how annoying it would be to have it split into proofRecord chunks, probably not much actually. Another issue would be with compliance proofs that require a higher-level structure - Merkle tree root - as input - the roots would have to be explicitly a part of each proofRecord for each compliance proof, which we clearly don’t want.

So, I removed ProofRecord. Having sets of proofs, as mentioned above, doesn’t make verification easy, so here is another option: have some structure in between plain proofs and overkill proof records.

Proposed structure for action proofs

  1. We split proofs into two structures: resourceLogicProofs and complianceProofs
  2. These data structures are not simple sets: `

resourceLogicProofs: Map Tag (PS.VerifyingKey*, PS.Proof)` - so for each RL proof, we are told which resource is the self resource, since app_data also uses tags as keys, we can easily get the app_data instance + add all the commitments/nullifiers from the existing components

*Instead of a verifying key we actually want to have LogicRefHash, but for simplicity let’s think of it as verifying key

For compliance proofs, we can either use a similar map with a set of tags instead of a single tag, or use proper proof records, given that

  1. for compliance proofs, we also need some data structure similar to appData for logic proofs
  2. we don’t have to have it in a separate set

This makes binding proofs and their inputs easier:

  1. For RL proofs, take a proof, set the tag as the self resource, add the rest of the commitments/nullifiers as inputs, find the relevant app_data using the same tag and add it to the inputs - this should give us all the inputs required to verify RL proofs
  2. For compliance proofs, either take the whole proof record and verify it, or use the key as the input resource commitments/nullifiers, then use the same key for another data structure that stores the rest of the required inputs
  3. To ensure all resources have associated proofs, we can get all of the keys and compare them to the sets of commitments/nullifiers (should be equal), and also make sure no tag is used twice

Transaction

For the transaction delta, we might also want to modify the delta proof type to have the associated public inputs/verifier key easily discoverable, but we can’t fully describe the new structure yet because:

  1. there are interoperability issues (if the same transaction contains multiple RM resources, who decides which delta proof proving system to use? → what verifying key to associate?)
  2. delta proof constraints and inputs are not properly described yet

So I’m mostly adding this part to signal that it might require slight modifications as well

Merkle roots

I think the same problem applies to the merkle roots set. How does the verifier know which root to use for verification of a particular compliance proof? Having it included in the proof record will solve the problem, but will introduce duplication of data. At the same time, we can’t easily switch to a map because the same root might be used for multiple proofs (and we might even need multiple roots for the same compliance proof, which i now realise might be a problem for the circuit). I can’t see a smarter solution that would make root discovery easy and not imply too much data duplication. The simple solution would probably be:

  1. include roots in proof records
  2. provide not the roots in the instance, but accumulate them all together to have a single input instead of variable number of inputs, and provide the actual roots in the witness. Then verification would be:

check that the provided roots in the witness accumulate to the publicly provided value from the instance

for each consumed resource commitment, verify the path using the corresponding root

Note that fixing the compliance unit size to 1 input and 1 output resource solves half of the problem: we still need a way to figure out what root corresponds to what proof, but the circuit would expect just one root (so the solution part number two won’t be needed).

I’m not sure what advantages it might bring to have compliance units of a different size, so I would:

  1. fix the compliance unit size in the spec (right now it isn’t) - but that would mean the transparent case would have to follow this too
  2. move roots to compliance proof records from the transaction data structure
1 Like

Thanks for the write-up!

Just as a note, it’s possible to avoid data duplication in general by referencing data by hash, and including a separate “data” field in whatever structure is being passed around which contains all the actual data.

This approach seems reasonably straightforward to me.

For the two possible methods of compliance proof representation that you propose:

  1. a map with a set of tags (could be an ordered set as a key), or
  2. full proof records

Is there any data duplicated in approach (2)?

Whatever we pick here, all of the exact steps required to verify (1, 2, and 3 in your list) should be written out explicitly in the RM specs.

This is true, but it’s not a lot of data. We could perhaps even reference Merkle roots by an even shorter hash (or the first few characters), albeit with some chance of collision (in which case all Merkle roots matching the prefix could be tried).

I couldn’t quite follow this - is it possible the list formatting is a bit wonky? What do you mean by accumulating all the roots together?

My understanding is that by doing so, to find the actual data, we have to hash all of the possible referenced options and pick the one that matches the included hash, which sounds like some work.

I don’t have preferences, so my question is: what is worse for the system: duplication of data or the extra computations? These computations would have to happen each time a transaction is verified. If we verify all transaction proofs at once, it is at most n extra hashes computed per proof with n being the max number of instance components for a proof in that transaction

Yep, but I think it is only the roots because we only need one compliance proof per each resource (as opposed to RL proofs, which include the same resource in a proof numberOfResourcesInTheAction times)

If it isn’t a big deal, I feel like it is easier to just copy the roots, but I don’t know the size of the deal. If we use a shorter hash, would the length be fixed in the spec? I think the hash size depends on the max transaction size, and the transaction, in theory, can be arbitrarily large. If the limitation for transaction sizes is computation/network, then over time comfortable transaction size might grow (if computers get stronger), and the hash size should ideally adapt for that…I’m a bit overthinking it here.

I mean the approach we discussed a few times before: when we have multiple roots expected as input for the compliance circuit, put inputs (in this case commitment tree roots) in a merkle tree, provide the input tree root as instance and the rest as witness

Actually now reading it, I’m not sure we would need to accumulate roots: multiple roots are only a problem if the number of roots is not fixed, and in our case it is fixed for the same circuit, but different compliance circuits (in different RM instances) might have different number of roots

1 Like

Computing n extra hashes doesn’t matter very much, compute is cheaper than bandwidth (generally). Modern consumer hardware can compute millions of hashes per second. We can reason about very specific performance tradeoffs later but if we want a decent general pattern for now I think referencing data by hash is quite alright.

Let’s just copy the roots for now. If it’s a problem, we can optimize it later.

Ah, I see, makes sense. I think it’s OK to require a fixed number of roots for the compliance circuit for now - anyone making a proof will always just use the most recent root they know, right?

1 Like

A quick update on compliance units

  1. Make compliance units an explicit sub-structure of actions (it was more of an implicit partitioning induced by the way we construct compliance proofs) carrying compliance proofs along with the instance and the verifying key.
    • Consequence: duplication of roots (tx), duplication of tags (action). To fix:
  2. Reference roots (stored in tx) in compliance units
  3. Reference cm/nf (stored in action) in compliance units
    • Note: a verifier would need to first map the tag references to the tags, get the actual tags and pass them to PS.Verify (extra dereferencing in comparison to duplication)
2 Likes

I think the roots can still be referenced from the TX, if the references need sufficiently less memory compared to duplicating roots.

Regarding the tags: does the action need same level access to the tags, or can e.g. verifying the action just be recursive and introspect the compliance units when needed?

I think the roots can still be referenced from the TX, if the references need sufficiently less memory compared to duplicating roots.

That is why we have steps 2 and 3. In both cases it is implied to use shorter hashes to reference roots/cm/nf

Regarding the tags: does the action need same level access to the tags

Tags are used in both compliance units and logics. Unlike compliance units, logics in the action take all action tags as input

1 Like