ART report: ZKVMs

Despite the ZKVM report not being finished, feedback is already welcome

1 Like

For tomorrow’s discussions with Adrian Hamelink and Ferdinand Sauer, what questions or comments related to STARK/SNARK ZKVMs, the role of the Juvix compiler or any other question related to the ZKVM report would you like to address? I’m aiming to have an agenda for tomorrow. For example:

  • How can compilers help us in designing ZKVMs?
  • Does recursion in STARKs pose any security problems?
  • Do folding schemes leak any information about the verifier key? In other words, can we implement function privacy in a folding scheme like ProtoStar?
  • What are the considerations we should have to implement ProtoStar in Halo2?
  • What differences in recursion do PCS such as FRI/Brakedown vs IPA/KZG bring?
  • What are the pros and cons between a NIVC dynamic ZKVM and a traditional STARK ZKVM?
  • Do folding schemes require cycle of curves?
  • Is the work on towers of binary fields compatible with folding?
  • How expensive is the overhead of folding?

If you have answers to these questions, feel free to write them down here, too :pray:
@paulcadman @xuyang @jan @Lukasz @cwgoes @jonathan

Compatibility of Juvix backends with folding (speculative).

Juvix supports (or will support in the near future) three approaches
to compilation into circuits:

  1. Whole-program compilation-by-normalization to VampIR.

    This pipeline is the most complete one with support for all Juvix
    features, but it’s not compatible with folding. It fundamentally
    relies on compiling the whole program at once and it’s not possible
    to modularize it.

  2. Compilation to GEB.

    In principle, it should be easier to modularize GEB
    compilation. One could e.g. try per-function compilation,
    generating a separate circuit for each function. A necessary
    requirement is that high-level Juvix features (e.g. algebraic data
    types, higher-order functions) are explicitly encoded into field
    elements, according to some fixed encoding. It seems that this is
    what GEB essentially does, but one could also use a different
    encoding scheme. This encoding scheme then allows to compile parts
    of Juvix programs separately. This is necessary but not sufficient,
    because it still needs to be figured out how to exactly divide the
    programs. With this approach one would divide the high-level
    lambda-calculus JuvixCore or GEB representation of Juvix programs,
    which might be more difficult than dividing a lower-level instruction

  3. Compilation to CairoVM.

    A Juvix program can be compiled to a sequence of “imperative”
    CairoVM instructions (or similar). Then to divide the program one
    could choose any division of the instruction sequence into
    subsequences and compile those into separate circuits. This seems
    easiest to implement on the Juvix side as it does not require
    inventing any new encodings for high-level features, only choosing
    a division of the instruction sequence.

1 Like

I was thinking about this more last night and I realized that there are several distinctions in practice here, let’s say that there are different versions of this dynamic instruction set generation strategy - here are some I can think of:

  1. Fixed instruction set, no segmentation (no folding scheme)

    Compile time: the program is compiled to a specific instruction set (e.g. RISCV)
    Execution time: the program (in that instruction set) is executed, generating a trace.
    Proving time: the trace is proved in a single circuit (which iterates the RISCV interpreter).

  2. Fixed instruction set, fixed segmentation

    Compile time: the program is compiled to a specific instruction set (e.g. RISCV).
    Execution time: the program (in that instruction set) is executed, generating a trace.
    Proving time: the trace is split up into blocks (e.g. 100-instruction blocks) and each block is proved with the folding scheme (in a circuit which iterates the RISCV interpreter 100 times).

  3. Fixed instruction set, control flow segmentation

    Compile time: the program is compiled to a specific instruction set (e.g. RISCV) in a control flow graph. Each block (without jumps) is compiled to a single circuit (set of constraints).
    Execution time: the program (in those blocks) is executed, generating a trace.
    Proving time: each block is proved with the folding scheme.

  4. Fixed instruction set, dynamic control flow segmentation

    Like (3) but the blocks chosen for the folding scheme don’t need to correspond 1-1 to the blocks in the control flow graph.

  5. Fixed instruction set, just-in-time compilation

    Like (4) but the blocks can be chosen when the actual program inputs are known. This requires an additional ZK proof of verifiable compilation :laughing:

Overview (zkVM, ZK-VM, ZKVM)

Notes from session on ZKVMs with @alberto @cwgoes @xuyang @adrian Ferdinand & Adrian. Note that I may have missed things, or maybe incorrectly representing the content, feel free to suggest any changes.

We have a language called Juvix which we want to compile to circuit. Many ways to compile programs, one of them is to compile the whole program into a giant circuit. Another approach is the modular way. There has been five years of work in STARKs for ZK-VMs. I’ve been writing this article which is exploring different ZK-VM approaches and defining ZK-VM even is?

Conventional understanding is requiring a fixed set of instructions requiring an ISA, one can think of it also as an instance of IVC. This is probably the approach I am investigating in this document;

  • How IVC works
  • How IVC has evolved

We have seen how the verifier in IVC has done decreasingly work over time. At the beginning, it used to be expensive. With Halo 2 have seen the expensive part of the verifier is not there, allowing Halo 2 not to have a trusted setup. Currently, there was some innovation with folding schemes which removed most of the work of the verifier. It only needs to prove that the folding has been done correctly. Each step in this IVC chain is a NARK and doesn’t need to be succinct, thus is quite fast.

This is the ground for this discussion. There is something particular about IVC schemes, the function that gets computed at every step is fixed. Meaning, if we want to encode the whole circuit before Protostar and SuperNOVA we have to pay the price at each step, which didn’t make it appealing. Non-uniform IVC which is what SuperNOVA and Protostar made more efficient moves from one function to a set of functions we can select and only pay the price for FI that gets computed. Suddenly, we have something very appealing to become a ZK-VM. Protostar can have Plonkish arithmetization which is easier to reason about. NOVA is abased on R1CS.

We want to use compilers in this process. What do they bring to this context? Compilers have some understanding of the program we are working on. Therefore, we can work over a dynamic set of instructions dependent on a particular program we are compiling. General program having small circuits, thus having the overhead of folding is high. For a small computation, the overhead of folding is somehow expensive. If we have a compiler, we can model our set of F_I s that are constituting this non-uniform IVC scheme - find this sweet spot balance between circuit size and number of circuits. The question or main question here is how compilers help us in designing ZKVMs.


Chris had an insight about different types of ZK-VMs we may think of.

Some of my interest in the question comes from thinking about ZK-VMs in contrast to VMs. Originally, which is thinking about implementing in physical hardware which you can’t change. So you need to fix an instruction set. That part of the analogy in this case does not hold. We are talking about virtualized hardware can change anytime as long as prover and verifier agree. Given we have fewer constraints, are there interesting things we can do, and what are the choices and tradeoffs?

What is a program? One very common abstraction used in compilers is that of a control flow graph, which splits a program into a series of blocks and arrows of blocks based on jumps. Usually, it’s like computing A = 2 B = 3 + A and some kind of conditional jump. Within each block, there is no control flow, which means you just need to compute all the things, we could compile a block to a ZK-VM and is pointless because we need to compile each block anyway.

  • IVC comes with folding, it’s not the same as folding, but it was introduced by SuperNOVA as well, at every iteration you can execute a specific circuit. Folding schemes don’t work with stacks.

  • NIVC comes with folding, it’s not the same as folding, but it was introduced by SuperNOVA as well, at every iteration you can execute a specific circuit. Folding schemes don’t work with STARKs.

The input is some field elements and the output is some field elements. Presumably, when you compile to a circuit, you get some input and output as field elements. At a high level, we implement something that does this process based on the output of the circuit, it will know which other circuit to use. On top of it, we can have a program that just selects the right circuit. There will still be some base or final circuit, memory consistency circuit which checks that the right branches were taken.

This would be the deciding phase - making a SNARK for all your folding accumulators, or have a circuit for consistency check.

Consistency check circuit

  • checks memory copied correctly
  • checks that correct branches were taken

The verifier key is bound to the consistency check circuit, that way the verifier knows it’s checking what it can check.

From a compiler perspective, what we already do for VAMP-IR compilation is similar to what we can do with each block. VAMP-IR assumes everything was done only once. If you do it once, the whole program is input some field elements and output some field elements. When you split it the paths as input and output, you will have some algebraic data type with higher-order function inside and introduce some encoding as field elements.

One concern is function privacy. With this, you don’t get that per se there is some leakage here, will you be able to tell which path the program has taken?

I think you can hide it if everything is ZK, you can hide away that stuff as a private input. The states in between will be gone or ZKified. You do need all circuits 1,2,3 are public. Probably want to do it in the end to compress the proof anyway.

The only thing you need to check is the verification keys in each circuit at each folding step. Make sure opcode you are running at each step are consistent. The final deciding phase ensures all this stuff is correct. The deciding phase is taking the accumulation states of 1,2, 3. Circuit one will be folded like 5 times and its 5 executions, decider compresses all the accumulators into one succinct proof at the end. It verifies accumulation.

This leads to the question of you mentioned we can have function privacy. We don’t know which is the application we fold in the first place. Can we have function privacy with folding, if yes, how? We know with Halo 2 we can have function privacy. I don’t know if the accumulator vs. folding is the deciding factor.

Function privacy is more property of VM than proof system.

You can either do a chain with one accumulator and keep adding to it, or you have a tree like structure with N proofs, and you fold them 2 by 2 for a succinct proof at the end. You can make it independent with ZK by randomizing what you put out.

You can hide things? Yes. In the hiding process, as far as I know, current folding scheme takes statement one and two as public input and folds it to the final one? If we want to choose statement 1 as 1 and 2. Private input with commitment on two statements.

Instead of a recursive verification circuit, if you have IVC chain where you keep adding input is (x_1, w_1) output is (x_3, w_3) provide (x_2, w_2) as private input into the circuit.

If you do 2 by 2 both can be private inputs. At the start there was an empty accumulator enforced by the recursion circuit. If the fresh proof and accumulator and your thing is ZK then it will be independent of the output

2 statements will be folded into statement three. To achieve function privacy by folding recursion? We have some genera function.

You want to have the ability to call one function without knowing what it was. The result of the function is not used? if you want to do this kind of thing, and you want to have pre-computed to the VK, then it’s just saying I’m calling some function which belongs to some set and here is a proof for it.

We have a private circuit. If we call F function, don’t want to disclose which F is called. So we commit to F then we generate a proof for this F and at last we conceal the F and generate a recursive F proof. We will submit proof F eventually, eventually we will submit proof of step 3 which already proved that this F is verified.

Can you achieve the same thing with folding? there can be folding if there are several Fs… Assume there is one F in step 2.

  • As I understand, Halo 2 has two steps
  • As I understand, the IVC verifier has two main operations

    • (p_i, c_i)

      • Creating a polynomial-commitment pair (p_i, c_i)
    • [F1,…,Fn]U[Fn+1]

      • {F1,…,Fn}U{Fn+1}

(IVC): (verifier)

  1. (p_i, c_i)
  2. Opening of the commitment (PCS)


  • 1
  • 2

Halo 2:

  • 1
  • 2

Folding: doesn’t do one or two. If we can do this with Halo 2 the key differentiating factor is what enables us to do full recursion. Not specifically, Halo 2 can be any recursive SNARK.

Does this allow us to have function privacy?

In the SuperNOVA context, where you have an accumulator for each verification key, you have your set of accumulators of the stuff you can prove up until now. You can think of a hash map were keys are verification keys. If you want function privacy you have VK of the program you don’t want to say what it was, maybe what the set is at this point, hiding commitment to the verification keys. When folding will take this set and look up accumulator corresponding to hidden commitment of verification key. Fold this function into it and return to next accumulation. Once you have finished IVC loop, you have a set of accumulators and then you need to create proofs for it. You can create proofs of accumulators and create a batch proof at the end. You don’t know which functions you have folded at each iteration. You don’t need final ZK verification at each step.

The point of having NIVC is you don’t pay the cost of every instruction block. Then in the end, when you have to prove this entire has set of accumulators, are you back to proving every instruction block. For every independent circuit, you have to create a snark for it, but you only pay the full cost of executing a snark once.

There are different ways of achieving this functional privacy, but it’s more about the semantics of the VM than the proof system itself. Will this speed up proof generation. The recursive verification circuit is upward bound 10,000 constraints. You need to make sure the work you are doing in each instruction is at least 10,000 constraints. Otherwise, it doesn’t make sense.

The control flow graph of the program and the blocks we prove do not need to correspond 1:1.

If you have a circuit which is n copies it’s probably done if you actually prove it for N iterations, I think you only need to prove witnesses for execution steps and fill the rest with 0s.

How is this similar or different to Lurk?

We have the main circuit which takes the instruction, environment and continuation and takes that and returns next construction and continuation. The main branching circuit with 10k constraints, includes multiple copies of circuit, contains multiple instantiations of this loop. Sounds similar to consistency circuit check. Many iterations of the STF. With SuperNOVA we have these coprocessers and at any point we can call a Circom circuit to come into a loop. Coprocessor is almost like these blocks. Do you define specific ones, or are they generated by lurk? idk yet. I think you can write in lurk or have highly optimized R1Cs circuit for you. It is a R1CS architecture for a state transition function (STF) and you can load complex instructions for stuff you want a proper circuit for.

How modular is your compiler pipeline? We have a HLL which allows us to define an evaluation function and generate circuit for it so we can play around with the STF which will create a circuit which will run for 1000 iterations of so for one circuit where you build on lookups calling coprocessors and stuff.

People were talking about folding on every op code, but you have a high recursion cost, don’t think that is really correct because it would be impractical in practice. Comes down to how you design a VM. This is nice in theory, but no one does that.

Another project at least worth skimming is POWDER - you can define your own ZK-VM, it will build a ZK-VM with only the instructions that you need. I think there is some fundamental difference to this. One ZK-VM that has all the op codes and that is a monolith.

The Compiler side makes sense. It can be done in the sense of compilation, more concretely you would need an encoding of stuff. You can try to use GEB or implement this pipeline that they want. Encoding is the most non-trivial and time-consuming thing.

Very specific - you implemented Protostar in Halo 2 still some work to done, key difference is Halo 2 (PSE) - if the folding scheme allows us to have function privacy, and we don’t need to implement a compilation scheme because we can just implement Protostar.

What is still to be done?

What you would need is to define the semantics for this, verifying the previous folding. The way I like to think of it is that you

z_i+1$ = $F (z_i )
F'() -> (H(z_i), H(zi_+1))
Zi <- V(Pi_i Z_0)
Zi+1 = F(z_i)

Need to take function F and wrap it around another circuit. Taking an accumulator and putting in another accumulator. This is what’s missing in Halo 2. You have this framework called SNARK verifier which allows you to simulate an IVC circuit for some function. There could be verification of a KZG proof or partial verification of accumulation you return later on. This verification will be as small as you can get and will allow you to minimize constraints.

The decider will create a proof for the final accumulator, which is a normal SNARK proof. Because how protostar works its way close to a sum-check. You probably want something HyperPLONK based where you want sum-check as a first class primitive.

The SNARK verifier is built by PSE guys, written for Halo2 PSE. Axiom has a fork of PSE Halo 2 and a fork of PSE SNARK verifier. From what I’ve seen, both of those can’t be merged back into each other, two separate projects now. My impression is, if you want to do this kind of stuff and do it well, I would also recommend forking Halo 2 and ripping out most of it and tailoring it to what you are proving. You know what you are doing if it allows you to write the prover. In this case, you mentioned Halo 2 models with many columns, for protostar you only need one. If you use generic Halo 2 you will have traces of your circuit not efficient for circuit being proven in protostar. Plonkish provers were written.

You’d rather have a wider circuit with shorter columns. If you just have a single column you are paying n log(n). With Protostar, you really want to minimize number of commitments you are seeing in your proof. Kind of like what is done in R1CS. The recursive verifier - pay one scalar multiplication for column committed, if you can get down to one recursive overhead its going to be better. The way they do lookups for each table, you have a set of columns. Pack all of your lookups into a single table and have domain separation. Didn’t say a way to do it without modifying the Halo 2 API, till it fits your needs.

Not convinced by function privacy

After folding, does the verifier know x_3 contains x_1 and x_2? When you are verifying the circuits, you still need this verification key. The verifier knows VK1 and 2? You can’t do that because you need the same verification key for all three statements.

Does the verifier know what he will be verifying? Is the big F concealed or not? That’s the deciders task at the very end, if the decider is ZK wrt the function, then the entire thing is. The overall statement is what I care about.

The verifier at the level of the resource machine knows the commitment of the private key but not the verifier key. With one more layer of recursion. There are two verifiers; the first verifier that verifies a BTC predicate was correctly computed. In the previous plan for function privacy, we have to somehow check that even though which predicate call is changing. How do we check this in this previous plan? In the previous plan, we don’t know. But we know it’s the correct logic. If we are not adding ZK in the folding and only in final recursive proof phase. We can’t get ZK in the folding. You make a second layer which proves this virtual decider returns one.

[ F_1,...,F_n] U [F_n +1]

can we use the same technique with function F which is also a recursive function?

F = S_1(F_1) + S_2(F_2)+...+S_n(F_n)

There is no logic commitments between the tow schemes.

Does folding enforce a state transition? Yes i think you are proving a state S_n was computed from a state S_0

You can do PCD from IVC, its just the semantics of your state transition to verify several incoming nodes and outputs.

Can we get zero knowledge in NIVC?

The decider-prover needs to know the verification keys of all the possible circuits that exist. You will create proofs for every possible one. You won’t be any work for circuits you have executed. The verifier needs to know verification keys for all these circuits. If you have a set of keys for circuits over multiple currencies, you would have to include them to do a verification assuming you’ve run all of them, but then you can create a snark proof of the deciding verifier, so the final proof doesn’t depend on amount of possible instructions you have.

Does the verifier learn which functions we called? No, there does not need to be any logic to check whether the accumulators are empty or not. The proof is ZK regarding accumulator state? Yeah, some of these can be private inputs, so the final proof is a private input as long as the snark you’re making is doing the verification all of your private inputs are hidden as long as the final SNARK is in ZK?

You are dragging along this hash-map across your IVC chain, at the end you have the decider create a snark for all of these things. The verifier for these proofs needs to know the verification keys, at this point you apply functional privacy. These accumulators for private functions come from all possible currency circuits.

We store all the resource commitments, but not the resource_logic commitments is the number of distinct programs is private.

Do we ever need to enumerate over this set? Do we have to iterate over the set somewhere?

It may be complicated to get the folding logic itself. If you run the BTC circuit many times, you won’t actually be able to fold it. For each one, you prove the private function calls comes from each circuit, that set can grow independently. When you do the final verification of the decider, you need to include the latest set of verification keys for each crypto circuit.

We have a Merkel tree of resource commitments and in that resource is another commitment to the resource logic and the thing we want to hide in the whole transaction is which resource were consumed and what the logics were which is function privacy. If these are the Fs, can we select the next Fs to be used by proving the commitment exists and the set of possible Fs is this set? The set is finite, and we can prove membership, but we can’t iterate overt it - if that distinction makes sense? We can prove membership in the Merkel tree to ensure F we are selecting is correct.

Instead of having one accumulator per circuit, which is the protostar model with the HyperNova model, you do a bit of extra work, but accumulators are uniform, and you can take two different circuits and combine them into a single accumulator at that point to prove membership of circuit key. That might simplify design knowing which circuit you are folding, but not at the very end.

It seems the key question is, when we prove which F we are proving or selecting, can we prove the resource_logic commitment?

Clearly, we can use folding to do the thing we are previously doing, but it doesn’t save us recursions for the previous component. The question off does folding save us recursion for the privacy we want has to do with how the function selection works. What we cannot do is enumerate over all the leaves of the Merkel tree. Maybe then we can save recursive steps later and still get the same privacy, but we are not sure. Probably need to dig into IVC more.

Protostar - need to know VK when doing accumulation - still have dependency on VK which you need to extract at the end. With HyperNova result, no longer depends on the verification key because it goes into the accumulator as you don’t need to keep track of it later on. Cost is log(n) hashes.

The idea with HyperNova is that you do a bit of extra work to prove your circuit, but you stop at the point where you have to prove a polynomial opens at a certain point and that polynomial is a random linear combination of linear columns and fixed columns.

We should dig into his in the report and distinguish different instantiations of folding

Users generate proofs over original resource logics in the same model we just never fold over? Cairo does not support folding. What does an actual observer learn? with STARK hash recursion can help conceal resource and resource logic, observer knows everything about the intent.

The distinction between this and Cairo is only this (folding) is faster. If you had data privacy not function privacy, they would learn BTC was sent but not how much or who. The resource_logic would leak but not the resource.

Jeremy would need to implement the full resource machine in Cairo to get function privacy.

Does folding need a cycle of curves?

The expensive part about the recursive verifier, especially in the SNARK context, is you will have to do non-native verifications. You will need an extra layer of recursion, explicitly targeting. Verifying a proof on the cycle curve will be cheaper than non-native scalar multiplications on the same curve. You do need a cycle of curves, not to make it explicitly expensive. If you don’t do a cycle of curves, the amount of work the verifier is doing will blow up.

1 Like