Proper abstraction of instruction sets / virtual machines

The objective of this topic is to discuss how to properly abstract specific virtual machines such as Nock, Cairo, and RISC0 in the Anoma specification. At the moment, a virtual machine, which we will take here as simply a way to represent arbitrary classically-computable functions, is required in a few places:

  • in the protocol where we need canonical commitments to and evaluatable representations of functions, such as external identities
  • in the resource machine, for resource logics
  • in the resource machine, for transaction functions
  • in application definitions, for projection functions

In general, we do not need to use the same virtual machine in all of these places, or even a specific virtual machine for a specific place - we can support interoperability between multiple representations; we just need a way to demarcate what representation is in use so we can interpret data as the correct code.

In certain cases, however - such as external identities - we do need a way to understand when two representations are equal, i.e. extensional function equality - and to compute a canonical succinct commitment upon the basis of which equality will be judged, we do need a canonical representation for the preimage. It may be possible to work with a weaker notion (commit a = commit b entails a = b, but a = b does not necessarily entail commit a = commit b), and use runtime equality judgements (a la ā€œsigns forā€ and ā€œreads forā€, which we need anyways), Iā€™m not sure yet.

First, however, what exactly is a virtual machine? Letā€™s try to capture a definition in math:

  1. A virtual machine is an encoding of arbitrary classically-computable functions defined by two (type-parameterized) functions:
    • deserialise : bytes -> Maybe (Repr T) which attempts to deserialise a bytestring into an internal representation of a function of type T.
    • serialise : Repr T -> bytes which serialises an internal representation of a function of a type T into a bytestring.
    • These functions must be inverses, i.e. deserialise . serialise = Just . id. and fmap serialise . deserialise = id.
  2. An evaluatable virtual machine is a virtual machine with an additional function evaluate : (Repr (T0 -> T1)) -> Repr T0 -> Repr T1, which simply calls a function on the provided argument (in an internal representation). evaluate is required to be deterministic.
  3. A provable virtual machine is a virtual machine with two additional functions parameterized over a proof type P:
    • prove : Repr (T0 -> T1 -> T2 -> boolean) -> Repr T0 -> Repr T1 -> P generates a proof, where Repr T0 is understood as the private input, and Repr T1 as the public input.
    • verify : Repr (T0 -> T1 -> T2) -> Repr T1 -> P -> boolean verifies a proof for a given program and public input.
    • Should P not reveal any information about Repr T0, the provable virtual machine can be said to be zero-knowledge.

I think these definitions should be (roughly) sufficient for our purposes. They do not answer, for now, the question of how to convert between representations in a semantics-preserving way (i.e., compilation), which may be useful to introduce into the protocol structure in the future, but I think we do not need it yet.

Following this route, in the specs we would then define how each of Nock, Cairo, and RISC0 instantiate these types. Before doing that, I wanted to write this up and seek input from @vveiln @jonathan @degregat @mariari @Moonchild @ray .

1 Like

seems fine generally. mainly i think types is ā€˜theā€™ problem and other than that there is not much that can go wrong

presumably missing a ā€˜-> boolā€™ at the end?

not sure if an intentional omission but we should ideally have a way to limit gas/execution time

1 Like

Ah yes - fixed.

True, I will include this as well.

Are T2 and P here the same thing?

The definition is broad, but should it also require correctness and soundness properties for a provable vm (not necessarily zk vm)? It feels odd to talk about zero-knowledge without correctness and soundness implied

2 Likes

Ah, I miswrote the type, there should be a boolean - fixed.

Yes, the definition should include correctness and soundness properties.

2 Likes

This sounds like it falls into the domain of GEB, so once these problems are solved there, we should use GEB, or port over the needed formalism/approaches from it.

3 Likes

That is also my hope - cc @terence would you want to chime in here?

1 Like

Yes, thatā€™s exactly right ā€“ indeed, one of the things which Geb will automatically provide is, given two specifications, a specification of what it means to translate between them in a semantics-preserving way: a universal-property-preserving functor between theories (where theories are defined as the universal properties they exhibit).

(It should also automatically provide executable specifications in the form of continuations ā€“ anything that you donā€™t implement is ā€œimplementedā€ in terms of a continuation which effectively means ā€œwhat would/could happen if you implemented and executed thisā€.)

2 Likes

Thinking about this some more I realize that Iā€™ve been assuming that virtual machines can typecheck their internal representations, which is not necessarily always going to be the case ā€¦ - however, we need some way to encode how inputs are supposed to be passed to functions - eval should also be able to handle multiple inputs. Maybe some virtual machines will need the ability to fail if they return the wrong type. Also, we need to introduce the concept of ā€œgasā€.

Additional aspects to standardise here in evaluate:

  • errors (out-of-gas, type-mismatch, etcā€¦)
  • logs (standard data representations)

Multiencode/multidecode:

Multiencode takes a value in internal representation and a set of preferences, and tries to encode it in a bytestring according to those preferences. To convert between virtual machine representations, this will require compilation (which will not be supported in all cases). Multiformat codes will be included to indicate which formats are in use.

Multidecode takes a value in bytestring representation and tries to decode it into internal representation, according to the multiformat information and the data encoding and virtual machine formats known by the decoding party. Attempting to decode unknown formats will result in an error.

In general, canonical commitments to data and code are made over the output of multiencode. multiencode is also used before storing data or sending it over the network. Any party should be able to take any piece of stored or sent data and a type and attempt to deserialise it with multidecode.

Open questions:

  • Implicit FFI with multi-virtual-machine functions? Can we have a multievaluate ?
  • Can VMs be passed functions from the environment? How will this work?
  • Do we need both proof systems and provable virtual machines?

I think some additional ā€œmultiā€-functions may make sense:

Multievaluation evaluates the application of a function to a value, where the function itself may return another application (perhaps with different virtual machine) to be further applied.

edit: it probably makes more sense for multievaluate to pass ā€¦ a pointer to multievaluate to the VM, so that the VM can call multievaluate within itself, receive the response, and continue execution - more straightforward than CPS.

(we should think carefully about the types hereā€¦)

Multicompilation attempts to compile functions represented in a data value based on a set of preferences. Compilation will fail if unknown VM conversions are attempted.

We should also talk about the guarantees provided, e.g.:

  • multiencode a = multiencode b ā†’ a = b
  • multiencode a /= multiencode b does not imply a != b
  • separate ā€œevidenceā€ type that a = b, which can include:
    • multicompile (some preferences) a = b

This evidence type should carry a Proof (in the proof system sense), which could be of varying practical forms:

  • replicating multicompile oneself
  • trusting anotherā€™s run of multicompile
  • succinct proof of computation correctness

The latter implies the ability to express compilation of one VM to another in terms of a ā€œbaseā€ VM, which we (= everyone) knows how to verify proofs for.

I think actually all these types (BasicValue, BasicType, Proof, etc.) probably need to be defined with mutual recursion.

Notes to self:

  • move the natural number multiformat basics to Basic Abstractions (but keep the specific lookup tables in Concrete Implementation)

A related question: how can virtual machines execute queries? I think we could build a sort of generic interface where VMs are passed a query function that implements this kind of functional namespace. That should be sufficient for pretty much everything. Queries will only make sense in certain contexts - e.g. in transaction functions, but not in resource logics - so we shouldnā€™t pass this function all the time, only in those contexts which make sense.

With multievaluate, we can have a version of this function that acts similarly to parts of the identity machine, and uses available evidence + preferences to use certain known-to-be-equivalent versions of functions (e.g. compiled versions) instead of others. Then ā€œJITā€ compilation can happen e.g. asychronously once certain statistical thresholds are met.

Optionally, this ā€œsmart multievaluateā€ can also use known proofs of the results of certain evaluations (instead of repeating the evaluations). If we want, we could even build a version that ships off some evaluations to a compute service provider (although this would usually be too-high-latency as opposed to some pre-coordination of the compute work split).

Is the idea to make an IR (or a set of IRs) with a canonical set of features, and then have compilers from these IRs to the different virtual machines? And we want an interface which must be satisfied by all of them? I imagine the set of features will vary based on the application. I would assume the programs required for transactions use fewer features than those for resource logics.

The obvious thought I had was a compiling to categories approach where we define some minimal category of required components which programs are freely generated from. Each of these components (along with the composition methods) would then be given semantics via a functor out of this free category, producing an actual program written in the virtual machine code. But, maybe Iā€™m misunderstanding what the goal is.

This seems very similar to the notion of monoidal computer presented in, for example, this. To rephrase it a bit, we have a category of types, and, among those types, is a type of programs, P (could be Bytes), such that, for any pair of types A and B, there is a surjective function which turns any morphism from X to P in (the largest cartesian sub-category of) our category of types into a function from X āŠ— A to B in our category of types. The X allows us to reason about a family of programs, and taking X to be 1 gives us specific programs. For each P and pairs of types, there are universal application morphisms \{\}^{AB} : P āŠ— A to B. One could imagine our category having several P s, with separate universal applications, corresponding to the different virtual machines. This formulation of computation is designed to deal with first-order computation (in contrast to cartesian closed categories [and turing categories] which are geared toward higher-order computation). This formulation may assist with the ā€œhow to convert between representations in a semantics-preserving wayā€ question.

One thing thatā€™s not clear to me is what Repr T is supposed to be. Are we assuming some decoding function decode : Bytes -> Maybe T, and Repr T is a Byte thatā€™s known to return a T by decode? Or, at least, something like that?

prove : Repr (T0 -> T1 -> T2 -> boolean) -> Repr T0 -> Repr T1 -> P
verify : Repr (T0 -> T1 -> T2) -> Repr T1 -> P -> boolean

If Iā€™m understanding this properly, we have a (representation of) a function f : T0 x T1 -> T2. Thereā€™s also the relational f^ : T0 x T1 x T2 -> 2 := (Ī»(a, b, c). f(a, b) == c). Then we must have something like

  • verify(f, t1', prove(f^, t0, t1)) = ???

Iā€™m honestly not sure. How are the two t1ā€™s supposed to be compared? Is this a typo and weā€™re supposed to have

verify : Repr (T0 -> T1 -> T2) -> Repr T2 -> P -> boolean

? In that case, this makes sense to me

  • verify(f, t2, prove(f^, t0, t1)) = f^(t0, t1, t2) = (f(t0, t1) == t2)

Some things about this seem overly complicated. Why canā€™t we assume that Ī»(a, b, c). f(a, b) == c is representable then f is? In that case, we can provide f to prove. Or am I completely misunderstanding whatā€™s trying to be accomplished?

Also, it seems to me that the verify/proof functions should be essentially automatic consequences of choices of more primitive operations. One of those should be a choice of one-way function. This is a point being made in this mathoverflow answer (coincidentally by the author of the paper I mentioned). Iā€™m not saying we should follow that, exactly, but there may be a simpler set of required functions which are then used to define what it means to commit and verify to an execution. Iā€™ll have to put more thought into it.

2 Likes

Sort of, but not exactly. My aim is to come up with:

  • a standardized way to represent typed data, where we can support multiple encoding methods (of serializing typed data to / deserializing typed data from bytestrings), and which encoding method a particular piece of typed data is serialized with is clearly specified
  • a way to represent functions that does not require a single standardized IR, where:
    • different functions can be represented in different virtual machines
    • which virtual machine a given function is represented in is clearly specified
    • nodes who know how to convert representations between different virtual machines in a semantics-preserving way (compilation) can do so if they want

The latter is particularly useful because:

  • which VM is most suitable depends on which function is being represented (as you mention)
  • it would allow for gradual upgrades across the network more easily, where (for example) the same function could be represented in different VMs, and nodes unaware of the new VMs could use the old versions

Does that make sense?

1 Like

Also, we need to introduce the concept of ā€œgasā€.

I think there will be a natural way to express this in a language (e.g. Geb) whose type system explicitly encodes category theory, in particular enriched category theory: in terms of metric categories such as Lawvere introduced (see for example Metric Spaces as Enriched Categories II | The n-Category CafƩ). In particular it should be possible to write lifts of circuits in circuit categories to gas-enriched circuit categories, allowing a separation of concerns between the semantics of a circuit and its cost, and allowing computation of the cost of the same circuit under different assumptions about the gas costs of operations.

This is a neat read, but I donā€™t know how itā€™s to be computed: specifically, can it be done piecemeal alongside the computation, or do we need to compute cost in its own computation? I ask because evil impure VM side-effects will sometimes happen outside computation semantics but incur a cost in cost semantics.

2 Likes

Good question. I donā€™t know and would have to work out some examples to give an at-all-confident answer, but my initial hunch is that it would look something like this:

  • Assuming we have enough reflection to allow us to decompose any functions down to the primitive ones, we can automatically compute a worst case with a general algorithm.
  • However, in practice, this computation will often produce a result which is so pessimistic (even with respect to the worst case, let alone the amortized average) that it would require too much of a reservation of resources to execute for it to be useful.
  • However, that problem might be mitigated to a sufficient degree to make it practical by gradually building up a library of common algorithms (the kind that come straight from papers and books) with manually-proven worst-case and/or amortized bounds. Then typical ā€œapp-levelā€ development would depend on mainly using those algorithms, with the automated computation of the composition of them in the case of the specific app becoming adequately close to real worst cases to allow most app developers to have useful worst cases automatically computed without having to write manual proofs that such-and-such function is O(n (log n)) or whatever.
2 Likes

To expand on my previous comment a bit, I think that, for example, seL4 uses mostly (maybe all?) automated longest-path analysis (thereā€™s a discussion at https://trustworthy.systems/publications/nicta_full_text/5859.pdf but Iā€™ve only skimmed it). However, I also suspect that, within a kernel such as theyā€™re writing, the algorithms are very close to the foundations-of-computer-science kinds, and theyā€™re constrained to the ones theyā€™ve written themselves. (That paper spells out explicitly the numbers of certain operations that can occur between preemption points, for example.) In our context, where weā€™re hoping to analyze bounds on the executions of a very open-ended class of user-written algorithms, automated analysis might often be much more pessimistic.