In this post I will explain a concrete technique to aggregate all the proofs of a final (balanced) transaction. It is yet to be tested in practice. Pointing out mistakes, improvements, or opinions are welcome.
Given a program \mathsf P and an output x, the computational integrity statement that a proof \pi attests to is:
The NP relation we are dealing with is \mathcal R:=\{(\mathsf p, x);w)\ |\ \mathsf P(w) = x\}. Thus, the input w is the witness. The description of the program (in lowercase \mathsf p) and the output form the instance. Let (\mathsf P, \mathsf V) be a proof system for \mathcal R.
Since the aggregation technique leverages recursive verification, we require succinct verification. This means the runtime of the verifier \mathsf V on instance (\mathsf p, x) is polylogarithmic in the runtime of the program \mathsf P on input x. Many SNARKs provide succinct verifiers (including the STARK scheme used in RISC0).
What this post covers. I will start explaining the basic idea, and then show how aggregations can be combined. From there we will see how to generalize to ‘universal’ aggregations, and finalize sketching how to connect the RM layer with the settlement layer via aggregating transaction proofs.
Aggregating proofs – the basic idea
In the simplest case, all proofs are for the same program. Let us take as example the compliance logic (program) \mathsf P_{c}(w_c) \rightarrow x_c run over compliance units. The exact structure of the witness w_c, instance x_c , and description of \mathsf P_{c} is unimportant for the purposes of aggregation. We only care about the existence of a SNARK with succinct verification \mathbf{V}:
We aggregate compliance proofs one step at a time. At each step, a fresh output x_c of the compliance program is verified using proof \pi_c, and accumulated in a chaining hash h, (the bytes of x_c are hashed). Also, the previous aggregation is verified. In pseudocode:
\mathsf P_{ac}(\mathsf p^{in},h^{in},\pi^{in},x_c,\pi_c)\rightarrow (\mathsf p^{out},h^{out}):
- Check 1 = \mathbf V (\mathsf p_{c},x_c,\pi_c) // the description of the compliance program \mathsf P_{c} is hard-coded in \mathsf P_{ac}
- If h^{in}\neq h_0, check 1 = \mathbf V (\mathsf p^{in},(\mathsf p^{in},h^{in}),\pi^{in})
- Output h^{out} := Hash(x_{c},h^{in}) and \mathsf p^{out}:=\mathsf p^{in}
The aggregation program takes as input its own description \mathsf p^{in} := \mathsf p_{ac}. This just means that it verifies an execution of itself for a previous chained hash h^{in}. In other words, the aggregation is an incrementally verifiable computation (IVC).
Generated transcripts are dynamic, and of virtually any length. One action with two compliance units will aggregate twice, but another action with three compliance units can performs three aggregations.
Combining aggregations
With the IVC approach outlined above, a single actor can aggregate all the individual compliance proofs, independently of whether the proofs are from the same action (scope) or not.
Another possibility is that different actors aggregate in turns. For example, users announce unbalanced transactions with actions containing a single aggregated compliance proof \pi_a. The solver compiles the final transaction, and in the process aggregates the aggregations.
How do we aggregate aggregations? The process is the same, noting that executions of \mathsf P_{ac} can be verified incrementally once again.
\mathsf P_{aa}(\mathsf p^{in},h^{in},\pi^{in},x_{ac},\pi_{ac},)\rightarrow (\mathsf p^{out},h^{out}):
- Check 1 = \mathbf V (\mathsf p_{ac},x_c,\pi_c) // the description of the aggregation program \mathsf P_{ac} is hard-coded in \mathsf P_{aa}
- If h^{in}\neq h_0, check 1 = \mathbf V (\mathsf p^{in},(\mathsf p^{in},h^{in}),\pi^{in})
- Output h^{out} := Hash(x_{ac},h^{in}) and \mathsf p^{in}:=\mathsf p^{out}
The two-round aggregation process is depicted below.
Non-uniform aggregation
Suppose we have a total of \ell programs \mathcal P :=\{P_1,\ldots, P_\ell\}. In practice, these are the \ell different resource logics of a given scope. Whether the scope is an action or a transaction only matters to decide if we aggregate in turns or in one go. The programs are numbered as they appear in resources (resources are assumed to be canonically ordered in the scope). In particular, they may repeat, i.e. \mathcal P is a multiset, meaning that for i\neq j, we may have P_i = P_j.
The blueprint presented for aggregation of compliance proofs is:
-
at each step, verify an output of a specific program (the incremental verification),
-
accumulate the verified output in a chained hash,
-
verify correct aggregation up to the previous step (by recursive proof verification).
The blueprint can be translated to arbitrary program verification. The difference is that now at each incremental step we need to verify a different program. The first idea that comes to mind is to hard-code the \ell programs in the aggregation circuit. But that would yield rigid aggregations. Each multiset \mathcal P of programs would necessitate their own aggregation program.
We seek a universal aggregator that works for any two \mathcal P \neq \mathcal P'. To achieve it, the program P_{j_i} verified in the i-th incremental step is passed as an input. To keep track of what program is verified, the same trick as for outputs is used: the aggregation includes the (description of the) program P_{j_i} in a chained hash.
The pseudocode of the universal aggregator is as follows:
\mathsf P_{ua}(\mathsf p^{in},h^{in},d^{in},\pi^{in},\mathsf p,x,\pi)\rightarrow (\mathsf p^{out},h^{out},d^{out}):
- Check 1 = \mathbf V (\mathsf p,x,\pi) // the description of the verified program \mathsf p is an input of \mathsf P_{ua}
- If h^{in}\neq h_0, d^{in}\neq d_0, check 1 = \mathbf V (\mathsf p^{in},(\mathsf p^{in},h^{in},d^{in}),\pi^{in})
- Output h^{out} := Hash(x,h^{in}), d^{out} := Hash(\mathsf p,d^{in}), and \mathsf p^{in}:=\mathsf p^{out}
Soundness
Sound aggregation relies on the knowledge soundness of the proof system (\mathbf P, \mathbf V), and on the collision resistance of the hash function. Hash.
Let a bunch of programs \mathsf P_1,\ldots, \mathsf P_\ell, and a bunch of outputs x_1,\ldots,x_\ell. If there exists a valid proof \pi_{ua}, such that for chained hashes:
-
h = Hash(x_\ell,Hash(x_{\ell-1},\cdots,Hash(x_1,x_{IV}))\cdots )
-
d = Hash(\mathsf p_\ell,Hash(\mathsf p_{\ell-1},\cdots,Hash(\mathsf p_1,x_{IV}))\cdots )
the proof is valid \mathbf V (\mathsf p_{ua},(\mathsf p_{ua},h,d),\pi_{ua}) = 1, then we can use the knowledge extractor to get an input (\mathsf p_{ext},h_{ext},d_{ext},\pi_{ext},\mathsf p_{ext},x_{ext},\pi'_{ext}) such that h = Hash(x_{ext}, h_{ext}), and d = Hash(\mathsf p_{ext}, h_{ext}). Also, \pi'_{ext} is a valid proof for program \mathsf p_{ext} and output x_{ext}.
If x_{ext} \neq x_\ell, or \mathsf p_{ext} \neq \mathsf p_{\ell}, the extractor is an algorithm that finds collisions for Hash. This process can be repeated to argue that x_i is a verified output of the program \mathsf P_i.
Parallel proving
Proof carrying data is a generalization of IVC that allows to generate any type of transcripts (not just sequential transcript). Esentially, the witness is split into input and local data. In our case the input/output of the aggregation is the running chained hash, and the local data is the instance and proof that is verified incrementally. Suppose we want aggregate validation of three outputs x_1,x_2,x_3 of three programs \mathsf P_1, \mathsf P_2, \mathsf P_3. The universal aggregator can be modified to accept two running pair of hashes (h^{in}_1,d^{in}_1), (h^{in}_2,d^{in}_2), and perform two recursive proof validations, one for each pair. Incremental aggregation is now more complex, but allows parallelizing via tree-like transcripts.
Aggregating transaction proofs: from RM to settlement layer
The universal aggregator allows to generate a single proof for all proofs appearing in a transaction. Therefore, logic and compliance proofs can be accumulated together. This can be done either way, in two rounds, (aggregate within actions, then across actions), or in a single round (aggregate all individual logic and compliance proofs).
Well, you see where this is going. There is no reason we cannot aggregate transaction proofs. In the incremental proof generation, the digest and proof of the previous transaction is used to generate the proof of the new transaction. Obviously, this aggregation must be sequential and inherently two-round. Once the new proof is generated, the old one can be deleted. At all times, there will be a single proof attesting to the correctness of the logic and compliance predicates of all resources of the RM state. We can see this last aggregated proof as attesting to the well-formedness of the RM state.
Instance generation. The instance of an aggregation will always consist of two chained hashes. The first hash (h) contains all the accumulated instances of the incremental programs being verified. In other words, all the resources of the transaction. The second hash (d) contains all the programs. That is, the compliance and logic predicates. How exactly the hashes are formed depends on the aggregation strategy: the fan-in and ordering used to form the transcript. In the picture above we can see two different aggregation strategies.