Towards a general proof type

While I was working on writing up this discussion in this PR, I realized that we still don’t have a proper general proof type or multiformat, and I think we need one. In particular, I think that we want a proof type which can encompass “heterogeneous trust” / “heterogeneous assumption” proofs, including, for example:

  • A trivial proof where computation is simply replicated, and the data used as input to the computation is packaged in the proof. It has no extra security assumptions but is not succinct.
  • A trusted delegation proof where computation is delegated to a known, trusted party whose work is not checked. This scheme is succinct but requires a trusted party assumption (which could be generalised to a threshold quorum in the obvious way). Note that since the computation is still verifiable, a signer of (a, b, predicate) where predicate a b = 0 could be held accountable by anyone else who later checked the predicate.
  • A succinct proof-of-knowledge proof where the result of computation is attested to with a cryptographic proof (of the sort commonly instantiated by modern-day SNARKs & STARKs. Succinct proof-of-knowledge schemes provide succinctness as well as verifiability subject to the scheme-specific cryptographic assumptions. They may also possibly be zero-knowledge, in which the verifier learns nothing other than predicate a b = 1 (in this case, and in others, a and b will often be “hidden” with hash functions, such that the verifier knows only hash a and hash b but the substance of the relation obtains over the preimages).

This implies some slightly different requirements than would a simple “proof multiformat”. Let’s assume that the verifier - when verifying the proof - specifies which assumptions they’re willing to make, which might include cryptographic assumptions (as variously required by various SNARK/STARK schemes), trust assumptions (as required by a trusted delegation proof as described above), etc. These different trust assumptions (made by the verifier) should not, in general, entail different multiformat types - the set of possible trust assumptions is infinite, so that wouldn’t work - and we want e.g. trusted delegation proofs made by different parties to be one multiformat type, not multiple. I think a suitable multiformat might look more like the following:

  • We have some type of assumptions Assumption
  • prove_n generates the proof of type Proof_n
  • compatible_n checks if a proof of type Proof_n is compatible with a selected set of assumptions (returning true/false)
  • verify_n verifies a proof of type Proof_n

Question: should compatible_n and verify_n simply be combined?

Now we’ve shifted the problem to what exactly this “Assumption” type is. Without loss of generality, I think we can characterize assumptions as beliefs about logical implications (a \implies b for some a and some b), where, for example:

  • a could be that a particular trusted party has signed over a statement, and b could be that the statement is true
  • a could be that the algebraic group model holds, and b could be that a particular proof system is sound
  • a could be nothing, and b could be that c = c (some identity case)

In an ideal world, we could characterize all of these assumptions exactly (e.g. as mathematical statements) - however, that will not be feasible in the short term (precisely expressing cryptographic assumptions will require a sophisticated specification language), so we probably need a simple easily extensible sum type that can evolve along with the Proof multiformat (to add new types of assumptions). The encoding here needs to allow for additions to the sum type without breaking backwards compatibility (which may be a property we often want in general).

I think that this should be sufficient - in principle - to allow for something like smart multievaluation which can e.g. use trusted third party delegation automatically.

Thoughts? @ray @Jamie @ArtemG @AHart @vveiln @degregat

2 Likes

Starting with a sum type sounds reasonable and I could imagine it even being sufficient in the long run: If verify and prove depend on the proof type, we can just couple them on this level and deal with assumptions out of band.

I do not see the need to couple a specification of assumptions to the proofs at this level, but I might miss something. What do you think would be the upsides of that?

Until now I was thinking about any assumptions apart from proof type (e.g. trust assumptions), as data to take into account for decisions of the node, but not directly relevant to the validation of the proof.

If the goal is to make all proof types mixable non-interactively, it seems to me compatible would need to be a dynamic function, since e.g. in trusted delegation its evaluation would probably be observer dependent. Do we want to encode something like this at the protocol level (if it would actually be needed)?

Would we need to verify that compatible was executed correctly? If no, it would be more of a hint anyways ( → out-of band).

Could we verify that without falling back to another proof type? The only actual in-band proof types would be for which no fallback is necessary i.e. primitive or canonical proofs.

Even if compatible turns out to not be a necessary component, it seems like a useful convenience function, one that we at should at least provide reasonable defaults for, but I guess we need to clarify where we draw the protcol boundary first.

Note: As far as I understand, a set of nodes which decides one the same canonical proof scheme(s) could theoretically operate on the same state set, similar to agreement on the hash function inducing the namespace.

1 Like

We want nodes (or their operators) to be able to make decisions at the level of more abstract assumptions, not just specific proof types. For example, many different specific zk-SNARK schemes rely upon the same or overlapping sets of cryptographic assumptions.

What do you mean by “dynamic” here? My idea was simply that compatible_n takes as input a set of assumptions which the node (= observer) in this case is willing to make - that set is indeed observer-dependent, but the function definition itself isn’t. I think this response also applies to your second question.

Yes, roughly - strictly speaking, additional proofs could be created with different assumptions to expand access to the same state set (as long as a party who has the relevant historical data is available to create said proofs).

1 Like

If there is always a 1-to-1 correspondence between proof type and assumptions, is it easier to agree on a canonical DSL vs agreeing on a canonical set of assumptions attached to each proof?

If the latter is possible, we could just do a check wether any proof relies only on permitted assumptions from a predefined set.

Understood, the only thing that would be dynamic would be, e.g. the signature, and compatible would treat it differently on evaluation.

Then it feels like the goal here would is to define a way of building a hierarchy of proofs, assumptions and (proof, [assumption]) correspondences, based on a set of canonical proof types and a set of canonical assumptions.

This would be halfway between a sum type and a full formal DSL, but I think if we do it right, it might give us enough flexibility to even go down to the level of mathematical assumptions by resolving canonical proofs or assumptions at a later point, albeit without formal verification.

1 Like

I’m not entirely sure that I follow. There’s a many-to-many relationship, not a 1-to-1. For example, many cryptographic proofs require multiple assumptions (which are often shared between different cryptographic proof types), and if I create a trusted delegation proof and you create a trusted delegation proof, both are valid under the assumption that “both Chris & D are honest” and neither is valid under the assumption that “either Chris or D is honest” - but a proof we both sign would be.

I’m not entirely sure that I follow here. What’s your goal in crafting this data structure? Are you trying to allow for some system where we send different nodes different proofs in accordance with their assumptions? Are you trying to model the hierarchy which is sometimes present, where e.g. proof A might require assumptions 1 and 2, while proof B requires only assumption 2, so anyone who accepts A will accept B (but not the reverse)? Or something else altogether?

1 Like

Yes, sorry that was an error :sweat_smile:. I meant: There is a “1 proof”-to-“many assumptions” relation per proof.

Then we would bootstrap by agreeing on a set of canonical (proof, [assumption]) 1-to-many relations, e.g.

  • (trivial, [code])
  • (SNARK, [theorem1, theorem2])
  • (trusted delegation, [signature by myself])
  • etc.

For any new proof type that a node chooses to accept, it adds the assumptions to the “accepted assumptions list”, or it can add/remove assumptions explicitly, e.g. “accept signatures from this identity for trusted delegation proofs”. All this could be built on a sum type.

Yes, this would be the goal, as it seems useful for recursive multievaluation.

Then we could also do things like take assumptions of proof types and decompose them:
If I receive the relation (theorem1, [theorem3, theorem4]), including a valid proof of a type and assumptions I already accept I could include that into the hierarchy to resolve theorem1 and use proofs that either assume theorem3 or theorem4. Note: Should these relations be (optionally) bijective, or do we want to have many-to-many ([assumption], [assumption]) relations too? many-to-many ([proof type], [assumption]) seems less useful.

Or, if I want to accept delegation proofs, I add it via the canonical self delegation proof mentioned above. Doing this would generalize delegating delegation.

I think the only thing we need to do for this would be to store the (proof type, [assumption]) relations together with the proofs they came with, to reconstruct the hierarchy when needed, or just cache it.

We need to make distinctions of the type “I accept this proof type” and “I accept new relations verified by this proof type”, but this can happend on another layer, that has access to relevant information.

If we can agree on canonical bootstrapping types for the initial sum type, the dynamic fields could probably be content adressed and observer dependent, with the functions being the same everywhere.

In that sense, we would only use the canonical base multiformat as an anchor for the individual hierarchy of every node.

How is this proof verifiable if the work of the trusted party who did the computation is said to not be checked?

If we imagine a certain zkp system fixed, it always implies a set of assumptions. Are they expected to be included in the set of assumptions? What if a different set of assumptions is provided that is compatible with the proof but doesn’t guarantee the desired properties for the proof? Does t$Proof_n$ carry its own set of assumptions to check against the assumptions provided by the verifier as an argument to compatible_n?

I see. Yes, my proposal is roughly this, in the sense that the multiformat would itself encode a set of proof types, each of which specifies their required list of assumptions (that’s the compatible function). I would typically see node operators as wanting to make choices as to what kinds of proofs to accept at the level of assumptions, not at the level of proof types - both are possible, but some of the assumptions will be parameterized (e.g. a trusted delegation proof assumption is parameterized by whoever is signing over their attestation to the results of the computation in question).

I see what you’re going for better here now, and this does seem like it would be powerful - but we’ll also need to decide what complexity of mathematical logic (and thereby ability to perform e.g. automatic deduction) we want to model.

I don’t follow this - in general, A \implies B definitely does not entail that B \implies A. What do you have in mind here / is there a concrete example?

Where was this “self delegation proof” mentioned, sorry? I tried Ctrl-F “self” but didn’t find anything, maybe I’m missing something obvious :laughing: .

Yes, I think in order to do this safely we need scoped assumptions - for example, I may trust a particular party to compute some kind of historical indexing correctly, but I don’t trust them enough to give them the ability to insert arbitrary logical implications into my system. I’m not sure why this would need to happen on another layer? It seems to me like the assumption type could encode this scope directly.

I don’t quite follow this, what are the dynamic fields? I think I get what you’re gesturing towards intuitively, in terms of being able to bootstrap off some initial proof types, but I still think we’d need sub-multiformats, and I don’t see an important distinction between one multiformat and sub-multiformats - we just don’t require agreement on the full multiformat. Or do you have some specific difference in mind here? An example might be helpful.

1 Like

In the case of a trusted delegation proof, the verifier verifies a signature by the trusted party over the results of the computation in question (which the verifier assumes that the trusted party will not create unless in fact the results are as they say they are).

Yes, and yes (the latter). Perhaps we can just change this interface to have the proof object return the assumptions directly, which would be more straightforward considering this discussion, I think.

1 Like

I might be naive here (not a logician), but some very simple model should be sufficient. I also don’t know if we need automatic deduction, or if it would fulfill the use cases if we just iterate through the explicitly constructed hiearchy. Maybe @Jamie can chime in?

My question is if we want to be able to set a flag s.t. instead of “A \implies \{B, C, D\}” the relation that is added is “A \Leftrightarrow \{B, C, D\}”, or if we want to support many-to-one/many-to-many, like “\{B, C, D\} \implies A” or “\{A, B\} \implies \{C, D\}”.

Sorry, I was referring to the following, without properly naming it:

Yeah, that should also work. Since this feels like a decision based on side information I thought about it as perpendicular to the protocol, but we can in-band it.

I think what you’re describing is equivalent, but seems like a cleaner solution, by agreeing on one canonical multiformat and then having however many user defined multiformats that use the canonical one as a base, which determines wether custom hierarchies can interoperate.

1 Like

Ah, I see now. That seems like a potentially useful generalization to me, yes.

That’s possible. Could you write up a bit more specifically how a hierarchy of multiformats would work? I’m thinking of multiple / user-dependent prefixes (instead of just integers), but I have only a vague idea - maybe you have a specific one?

Using the data types described here, it would look roughly as follows (please excuse any potential errors in type theoretic notation, I hope this is clear enough):

Non-recursive version

type ProofRelation := ProductT [ CanonicalProofTypeV, ProductT [CanonicalAssumptionV] ]

type MultiProofT := CoproductT [ProofT]
type ProofV      := ProductT [ CanonicalProofTypeV, BasicValue ]

type CanonicalTypeV       := CoproducT [ DataValue ]
type CanonicalAssumptionV := CoproducT [ DataValue ]

Here, the DataValue in CanonicalTypeV and CanonicalAssumptionV are lists of encodings of proof types and assumptios and need to be picked at type instantiation. Then we can use content addressing to refer to any MultiProofT and need some mechanism to agree on the canonical base MultiProof. These proofs could live next to each other, and the node could just chose which ones to admit.

Recursive version

type ProofRelationAssertion := Product [ ProductT [ ProofTypeV, ProductT [AssumptionV] ],  AssertionV]
type AssertionV             := CoproductT [ MultiProofV, Nothing ]

type MultiProofT := CoproductT [Proof] 
type ProofV      := CoproductT [ ProductT [ ProofType, BasicValue ],  Nothing ]

type ProofTypeV  := CoproductT [ DataValue ]
type AssumptionV := CoproductT [ DataValue ]

(Note: ProofV and Assertion are supposed to be a specialized option type. See this question for context. ProofV might need to be a DataValue, but I’m not sure how expressive this needs to be.)

If we want to bootstrap a recursive system this way, the base ProofRelationAssertion and a base Multiproof could look something like this:

base_relation = [ "delegation proof by multisig", ["sig by ID1 exists", "sig by ID2 exists", ..., "sig by IDN exists"], Nothing ] 

base_multiproof(sigs) = {
  ProofV                = "delegation proof by multisig", sigs
}

base_relation could encode, for example, the trust assumptions of the network governance committee.

The next relation and an example proof:

next_level_of_relation = ["some new type", ["some", "new", "assumptions"], base_multiproof("sigs of ID 1 to N")]

next_level_multiproof(args) = {
  ProofV                      = "some new type", "something derived from args"
}

I’m sure this can be simplified or expressed more elegantly, but as a first approximation I think this is how we should be able to build things up recursively. Does this make sense? Please feel free to ask clarifying questions or point out errors.

1 Like

Ah, I think I get the recursive part - I was wondering more about the hierarchical multiproofs, which I had interpreted to mean different integer → proof type mappings (coproduct indices in this MultiProofT) for different multiproofs - but I don’t see that in your example, so maybe I misunderstood what you meant?

1 Like

Notes from discussion with @degregat :

  • Canonical assumption types fixed by specific protocol versions
    • e.g. SNARK assumptions, STARK assumptions, trusted delegate X
  • On the wire we can create new proof types with verify functions specified in known VMs which use new sets of known assumptions
    • We cannot create new assumption types
    • Need to prove that given known proofs, this proof actually holds given the specified assumptions
      • This needs to be spelled out in detail
1 Like

I have some very stupid comments to make and I may be miles off so forgive me if I am side-tracking the discussion.

I asked @cwgoes in private some questions regarding what proofs are we actually trying to formalize here, for what purpose, etc. Here is the copied discussion:

Artem:
1. Do I understand correctly that by proof we mean a zero-knowledge proof? If so, do we mean here a format for a general proof of any statement or is this regarding a specific type of proofs that is relevant for Anoma (e.g. a user U wants to prove that they own a resource R)
2. Do we want to make a format backend-specific (e.g. we currently use Cairo backend iirc, should we adapt to it?) or are we aiming for a backend-independent mathy formulation?

3. Either the best way to proceed is to probably look at examples of the formats used in other either math research formulations of the topic (if what is a “proof” was defined for some math paper then it probably has some important properties related to it as a math object. So we might want to have some type resembling the one presented in such papers) or the types used in other implementations. Would you by chance know where I can look for such examples? I sadly do not know where exactly I could look

Christopher:
1. The proof type I’m trying to construct here is quite general, and would not always be zero-knowledge - being “zero-knowledge” is a property that some but not all proofs would have.
2. The latter - specifically we’re aiming for a mathematical definition and a multiformat which would allow us to process heterogeneous proof types automatically (based on whatever assumptions the node operator is willing to make, which may vary).
3. Hmm, maybe we could look in some theorem provers for what their internal “proof” type is (which often seems to incorporate assumptions e.g. believe_me in Idris iirc)? This is not entirely dissimilar to a runtime “theorem verifier” (where other nodes are doing the proving part by figuring out what proofs to send).

So this conversation points me to the fact that we are trying here to formalize a very general notion of a proof. Well, a model that type theorists use is the BHK-interpretation. So from the type-theoretic view, a statement is just a type and a proof is a term of that type. In other words, the the type of a proof is dependent on a statement you are trying to prove. This is an “outside” view of what a proof is. In the logic world hence a proof-relation can be seen relation between a set of the powerset of propositions and a set of propositions (a relation which holds iff the proposition is derivable from a set of assumptions, which is a set of propositions).

However, this is more of a side-point and one probably many already know.

So the real question is: for what purpose are we constructing this format? What are we going to do with these proofs? The answer to that will really decide what the format ought to be. From the discussion above I do not quite understand what part of Anoma ought to use this object and in what capacity. The rule of thumb is that you need a type for a thing X when you either need X to compute things (so the functions in which the thing is used can be assigned appropriate types) or - which is identical in some frameworks - you want to prove things (in the mathematical sense) about X, so you need to assign to it an appropriate mathematical object, i.e. formalize it.

I understand that we use zk-proofs for checking whether e.g. some resource actually is in possession of some user. So I may understand if we were to make some type of formalization of zk proofs that we are using. But that seems to me very backend-dependent (?).

On the other hand, I do not understand how we use general mathematical proofs in Anoma. Are such proofs needed to run some computations? If not, why do we need a separate definition from the one we have in the literature?

Again, sorry if I am completely misunderstanding the point.

1 Like

I would summarize the goals here as:

  • we want proofs which can be used in the process of computing things, where proofs of a computation can often substitute for running the computation ourselves (as a node)
  • these proofs / “proof-aware evaluation” will be used in transaction evaluation w/the resource machine, in evaluation of state and historical queries, possibly in transaction function execution, and possibly generally in other places where we’re computing new state changes or computing queries over the history of state changes which happened across the network
  • these proofs may take various specific forms (SNARKs, STARKs, plain old signatures, etc.)
  • different forms of proofs require different assumptions (some of which I would characterize as “cryptographic assumptions” and some of which I would characterize as “trust assumptions”)
  • we want to make the assumptions required for each proof type explicit, such that nodes can choose which assumptions they want to make (and which they don’t), and communicate to other nodes which kinds of proofs they will accept (so other nodes can send only those kinds of proofs)

Does that help?

1 Like

That makes things much clearer, thanks!

Then from what I understand, this is a mechanism that allows us to avoid re-running certain computations given some presuppositions. So is the point that I can instead of running the computation just run some code which “proves” to me that the computation has been carried out succesfully? Or is it just a flag with no actual computational value itself? Like something that I see, I say “oh, if I see this then I can trust this computation” and if I don’t I say “aw-shucks I have to run the computation myself what a pain”?

1 Like

Yes, where what is necessary to constitute a sufficient proof is observer-dependent (up to what proofs exist and what assumptions an observer is willing to make).

1 Like

Could you by chance give me an example of the resource machine receiving some example of such a proof which is not a zk proof?

Yes - the resource machine will often process “trivial proofs” where verifying the proof requires simply computing the function, and it will also often process succinct proofs which are not zero-knowledge.

1 Like