Review of Nock as a VM language

For about a month, I’ve been carving out time to write up a review of Nock. The opinions expressed here are mine and mine only, but I think I won’t be wrong to say that the general conclusion reflects the experience of the entire Juvix team over the past year or so.

Some time ago we had a brief discussion about the rationale of choosing Nock as the VM language. I didn’t want to sound too confrontational, which perhaps made an impression that I consider Nock to be generally “fine”. I don’t really think so. My view of Nock has clarified as a result of increased familiarity and understanding, and the experiences of the Juvix compiler team. I believe my assessment should be stated clearly and thoroughly, hence this post.

There are a number of mostly minor issues which I try to list here, many of them probably not very important or relevant from Anoma’s perspective.

However, the main issue remains: Nock (and everything around it) is unnecessarily inscrutable and obfuscated, both in terms of the presentation and the actual design, to the point that claims about ease of analysis or auditing seem debatable. I don’t think this is completely solved by me writing up a reformulation of Nock. I don’t see a reason to stick with Nock specifically, beyond the sunk cost fallacy. If it were up to me, I would replace Nock with an analogous “extended subset” of Scheme in “operator” form, but with more mainstream presentation and design choices. The transitioning shouldn’t then be a problem. I give an example of such a language at the end of this long post.

My main point is: this whole obfuscation is harmful and completely unnecessary, so why not just get rid of it? Why choose an obscure VM language with no significant adoption or ecosystem, when it’s not difficult to come up with a clearer one satisfying the desired criteria?

More detailed issues follow. Obfuscation and idiosyncratic design are addressed specifically further down the post.

Reasoning about programs

Suppose \approx is an equivalence relation on programs that determines when two programs “behave in the same way”, i.e., are interchangeable. One would expect that, for any two functions f and f', if f \approx f' then g f \approx g f' for any (higher-order) function g. In concrete terms, if we can prove f \approx f', then we should be able to prove e.g. \mathrm{map}\, f\, l \approx \mathrm{map}\, f'\, l. In Nock, the general statement (for arbitrary g) is false for every equivalence relation except syntactic identity.

The problem is with “uncontrolled” reflection. Since closures can not only be called, but also their concrete representations inspected, no two functions with differing representations can be treated as equivalent in every evaluation context. The most one can say is that if f \approx f' then g f \approx g f' for any g which does not inspect the representation of its argument. The latter condition is unwieldy to state and tedious to prove. This makes reasoning about equivalence of higher-order programs difficult and cumbersome.

Nock semantics of closures is different in this respect from normal Lisp semantics. In Common Lisp or Scheme, when you evaluate a function f, you get a “black box” that cannot be inspected by ordinary operations. Some implementations allow you to decompile or disassemble it, or even keep the source around, but this is implementation-dependent and in reality seems to be used sparingly and mostly for debugging purposes. For the purposes of reasoning/verification, these “reification” operations can be ignored or (partly) detected by simple syntactic checks. In Nock, you can look into the representation of a closure without any explicit conversion, and it’s harder to see where you actually do that.

This will have to be dealt with one way or another – it will probably come up, e.g., in the compiler verification project. Changing this aspect of Nock explicitly might not be necessary, but it’s good to be aware of the issue.

One way out is to simply ignore the problem – the Juvix compiler (currently) never generates code inspecting function representations, so it may be assumed that this never happens for any function provided as an argument. To facilitate formal reasoning about this, probably a version of Nock with “boxed” closures would need to be defined, and equivalence between it and ordinary Nock just assumed for functions originating from Juvix. To facilitate interoperability with non-Juvix-generated code, some static annotations may be desirable.

I understand the intention here is that we ultimately want a deeply reflective system along the lines of Smalltalk. But even in Smalltalk, for runtime code modification you need to explicitly fetch the source and then explicitly compile it (but correct me if I’m wrong). I’m not sure how often you actually do that outside of debugging/visualization/hot-swapping/interactive programming tools. This is not an unsolvable problem – there exist formalizations of Smalltalk semantics which have been used to reason about compiler transformations. But it does make certain things more difficult.

This is essentially a “typing” issue. For reasoning about equivalence of higher-order programs, you need to distinguish between inspectable “code” and black-box “closures”, even if your language does not make this distinction.

Performance

The performance issue impacts mostly the ease of generating efficient code or writing a fast VM, not the possibility. If we really only care about running programs with a few branches and a couple hundred loop iterations, or we’re always IO-bound anyway, it’s not an issue at all.

Nock is a low-level language, but its primitives don’t map well into operations available on modern commodity hardware. Of course, it can be compiled, but the mismatch makes it more difficult to write a good native-code compiler or a fast VM. Realistically, we’re never going to do it (the compiler at least). The easiest way to compile Nock would be to try to translate it back to something like a JuvixTree representation (which can be feasibly done only partially), or translate to e.g. Chez Scheme and use its incremental compiler, or try using GraalVM (not sure how well that would work, but could be reasonable).

Any interpreter for Nock that isn’t very sophisticated will be relatively slow. Briefly: pointer chasing with S-expressions. In contrast to ordinary Lisps, S-expressions are not only a basic data-structure but also explicitly used to represent the environment. When not intelligently converted by the interpreter into fixed-size frame arrays whenever possible, or some kind of continuous stack(s), this means that accessing a function argument, a local variable or a library function requires n pointer dereferences where n is how deep into the subject the argument/variable/function is stored. On modern hardware, this is expensive as it often induces CPU cache misses.

I looked into the C implementation of the Nock interpreter in the Vere runtime of Urbit. It just seems to compile Nock into undocumented lower-level bytecode that can presumably be executed with more reasonable efficiency. But what’s the point of Nock then? A convenient IR? It’s not convenient – it’s obscure.

Nock was clearly not designed with curried languages in mind. It lacks primitives/builtins/jets for closure manipulation (apply_n function or its components).

Benchmarks

Benchmarking Nock VM implementations is problematic because to avoid jetting the benchmarks can’t use arbitrary arithmetic, but only increment and equality. I tried to get a very rough idea of the relative performance of the Nock VM in Urbit (via urbit eval), the Elixir Nock VM, JuvixCore evaluator, OCaml bytecode, OCaml native code and Chez Scheme. I benchmark two kinds of programs:

  1. counting up to 10 million,
  2. mapping identity 10 times on a list of 1 million elements.

These benchmarks are of course not very thorough, but they do give a general idea of the performance of direct function calls, variable access, essential arithmetic and data structure handling.

For Nock VMs, the Nock code was generated by the Juvix compiler. For the benchmarks I chose, the code generation is rather straightforward and there isn’t much that could be improved there – pattern matching on lists translates directly to an isCell check, function calls just replace the arguments and use Nock’s call operator, no closure-handling code, no calls to jetted functions, the Anoma stdlib is not even bundled.

The benchmark programs and detailed results are available here. I take into account differences in VM startup times (subtracting them).

Conclusions

  • Urbit Nock VM is around 4 to 30 times slower than OCaml bytecode, which is in turn up to over 10 times slower than Chez Scheme or native OCaml.
  • JuvixCore evaluator (naive one I wrote in Haskell) is up to about 2 times slower than Urbit Nock VM.
  • Elixir Nock VM is around 20 to 60 times slower than Urbit Nock VM. I know, it’s not meant to be fast at this point.
  • Chez Scheme is significantly faster than native OCaml on list manipulation, but slower on the counters. Generally, they seem to fall into a similar performance category.

Unclear design choices

Nock is fundamentally a Lisp-like lambda-lifted VM language with variables represented by tree path indexing (in this respect an inessential generalization of de Bruijn indices), but it makes some idiosyncratic design choices not typical for Lisps or VM bytecode. These choices are not justified anywhere I could find, and their utility is often dubious. Even if otherwise neutral, every non-standard choice is by default negative by making the language less understandable, harder to use, and potentially having other unanticipated consequences. Every such choice should therefore have a clear benefit. Perhaps some of them do, but lacking any explanation and being unable to infer it myself, I assume lack of utility.

Many of the points below are minor, some possibly result from me misunderstanding things, some could be safely ignored, but taken together they contribute to the unnecessarily obscure nature of Nock.

  1. Atoms don’t evaluate to themselves but always need to be quoted.

This goes against usual Lisp semantics. Normally, e.g., 5 evaluates to 5 – when you type it in the REPL you don’t get an error. So you can also type (+ 1 2) instead of (+ (quote 1) (quote 2)). The quoting of atoms is easy to forget, even when you know about it. What is the purpose of erroring on atom evaluation? Having one less rule in the evaluation semantics? This doesn’t seem reasonable – making evaluation on atoms implicitly error doesn’t make the case disappear.

  1. Non-standard semantics of closures.

This has been discussed above under “Reasoning about programs”. I can imagine this may have some benefits with regard to the deeply reflective live systems that are intended to be built on top of Nock, but do these benefits really outweigh the costs? Maybe having explicit reify would be a bit better for reasoning while still retaining the benefits of unrestricted reflection? Not insisting much on this particular point.

  1. Complex primitive operations.

Typically, in a low-level VM language one would expect each primitive operation to map to a fixed set of CPU instructions, with the set ideally not depending on the actual values of operation arguments. On one hand, Nock tries to be a “low-level functional assembly”, but on the other includes as primitives complex operations like structural equality or general tree indexing.

The purpose of a VM language is to choose the granularity of operations which are convenient for your intended source language(s), but at the same time can themselves be implemented reasonably efficiently and simply.

  1. General tree indexing.

Normally, Lisps have car and cdr (fst and snd). Nock compresses nested pair component accesses into a single “slot” operation. This could make sense in terms of reduced bytecode size, but see the previous point. A reasonable middle ground could be to have the “index” primitive which fetches the nth list element. This could still be (sometimes) compiled to fixed-instruction array indexing. With nested indexing, this is more problematic, so it would make sense to break up those.

There seems to be no clear benefit to having general tree indexing in a VM language. To do anything with it (compile, reason about, manipulate programmatically), you’d need to break it up anyway or otherwise deal with its complexity. For compiling functional languages, list/array indexing is sufficient – even if your language happens to have general tree indexing, the compiler can easily break it up without excessive code bloat.

  1. Structural equality as a primitive operation.

It would seem more fitting to have this as a builtin/jet. For completeness it suffices to have equality on atoms, or even only test for zero.

  1. Ifs but no switches.

There is a good reason why Cranelift, LLVM, WASM, BEAM and JVM all have the switch instruction: it can be efficiently implemented on typical hardware and it’s useful for compiling matches. In Nock, these need to be broken down into ifs. It’s easy to group them back, but to do that transformation the VM/compiler needs some way to represent a switch…

  1. Increment only with no other arithmetic primitives.

What’s the point of this? I guess we’re not playing the game of minimizing the number of operators at all cost? In practice, this “simplification” is largely illusory. Typically, when reasoning about or manipulating programs, the arithmetic operations can either be treated uniformly, or need to be handled on a higher level of abstraction than their Nock implementation. You need a specification/axiomatization for arithmetic, and Nock is not a practical one. What else could you do with the Nock implementation of decrement? I don’t think you’re going to test it against anything, or use it at all, if only because you can’t run too many programs when basic arithmetic is O(n). Moving arithmetic operations to jets doesn’t make them disappear – you still need to audit / reason about the jet implementations.

Of course, you don’t want to pack everything possible into the primitive arithmetic operations bag. A reasonable choice is a small set of primitives that themselves can be efficiently executed and that allow to implement all other arithmetic operations without changing their computational complexity or otherwise grossly affecting their performance.

  1. Why no cons?

Normally, (cons A B) is used to create a pair consisting of the results of evaluating A and B. In Nock, this is instead done with [A B] when A is a pair. The decision to have pair creation indicated by the first component being a pair is strange, non-standard and unaccounted for. The only justification I can think of is reduced code size, but in this case the benefit seems to be not big enough.

  1. Why no nil-termination?

In Lisp, e.g., (quote t) is a nil-terminated list (quote . (t . nil)). In Nock, [quote t] is a cell without a terminating nil. In general, instead of proper nil-terminated lists Nock uses improper lists (let’s call them “tuples”) to represent code. Why? To save on one atom at the end? The advantage of proper lists is that one can uniformly index them and calculate their length. For tuples, indexing the last element is performed differently from indexing all other elements (we’ve actually had mistakes related to this in our code generation). Also, in general you can’t infer how many elements are in a tuple from the structure of the tuple only.

Obfuscation

I’ve already brought it up once, and at this point its harmful hindering of day-to-day Nock usage by the Juvix compiler team is limited but not entirely eliminated. We’ve deciphered Nock, and partly gotten used to it, but the Juvix team still needs to regularly refer to my Nock write-up. I’d prefer to see the obfuscation just gone, as unfamiliar non-standard terminology crops up now and then, and Nock code keeps getting presented as chunks of unreadable numbers. This naming obfuscation makes it harder to have a meaningful discussion about Nock language design and implementation choices. Also, a potential future issue is that other people, inside or outside Heliax, may need to understand Anoma’s VM language. Unless preventing people from doing this is a goal, overhauling at least the presentation of Nock seems reasonable.

Nock approach to naming is unconventional to say the least. There are two issues.

  1. Things that should have names don’t.

I mean chiefly that the “combinators” (operators, really) are not named and paths are always presented by their numeric encodings. Even assembly and VM bytecode languages have some human-readable representations – the opcodes have mnemonics to be used when communicating between humans. I don’t buy the argument that you can just learn the operator numbers. You can just learn brainfuck, which doesn’t mean you should use it. There is a good reason to prefer mov rax, rbx over 48 89 D8 – it helps communication. But in Nock you’re apparently supposed to communicate things like [9 2 0 1]. I wrote up a reformulation of Nock hoping that at least the operator names and path notation would catch on.

  1. Practically everything that is named has some strange non-standard name.

Using a new name could be justified for unique concepts or for a few important variations on common concepts, but not in general. I don’t buy it that there are some minor differences that justify the use of a new name. It’s splitting hairs.

For example, the main data structure of Nock is S-expressions or binary trees, which are forms of terms or ADTs. These are all well-established notions and there are already too many of them. There is no good reason to introduce another one: Nock nouns are exactly S-expressions. Axis is just a tree path, jam is serialize, cue is deserialize, scry is fetch, slot is addressing, and so on. Why not just ditch all this obscure terminology in favor of more standard names everywhere?

Example: what is jetting anyway?

Recently, I’ve been trying to understand by myself how jetting works exactly, without taking any time of the engineering team. To be honest, I’m not completely sure if I succeeded, but here is my rough understanding. Functions can be designated with hints to be replaced by native implementations. There is a hash table: hash of function code plus relevant part of context (cores or whatever) to native implementation. On function call, you look up the native implementation by the Nock function code hash. The hashes themselves must be pre-computed (or memoized) and stored somewhere for this to make any sense efficiency-wise. The actual implementation is a bit more complex. The context is organized into a hierarchical data structure, because modules/objects (cores or whatever) can be nested. So you recursively compute hashes of the parents of your function/module and check if all of them match.

This understanding derives mostly from my conversation with ChatGPT and this link which I found later. My attempts to understand jetting from the official documentation failed. Why? When I start reading it, I get bombarded with idiosyncratic concepts: arms, cores, batteries, piers, axes, blocks of utterly unreadable hoon, sigcens, hooks, gates, pins, bunted samples, slamming, etc. I try to look up in the glossary what a core is: a cell of a battery and a payload, something similar to an object. What is a battery according to the glossary? The head of a core!

Few people have the time or dedication to re-learn the entire computer science vocabulary just to understand a simple VM implementation technique. It’s not a reasonable use of one’s time.

So I look into the Elixir codebase to maybe try to understand jetting from its implementation. But then again I’m confronted with mugs, arms, cores, gates, doors, samples, and so on. These are perhaps clear to people steeped in Hoon & Urbit, but not to the rest of the world. It’s not too bad though – I roughly understand what the implementation of nock does on function calls (the “nines”). The amount of esoteric terminology in the Elixir codebase is limited and manageable, but why have it at all?

This whole situation reminds me of a parser I once wrote in QBasic when I was a teenager. I decided it would be fun to have a bunch of global variables (all signifying valid concepts!) and name them after different fruits: apple, orange, pineapple, strawberry, etc. One can guess how that ultimately ended.

My point is: you don’t need all these incomprehensible names to pre-hash the code of designated functions and then look up the hashes on function calls to fetch native implementations. You shouldn’t need any of the esoteric terminology to describe the actual more complex algorithm. One can explain this using normal CS language. It’s not some earth-shattering invention to deserve a whole separate vocabulary.

The above example is not particularly contrived. To independently implement a Nock VM, one needs to understand jetting – it’s not feasible to run almost anything without it because even basic arithmetic operations are jetted. To target Nock in a compiler, it also helps to know how the VM is supposed to work. And it’s hard to understand even simple things when communicated in some made-up Martian computing vocabulary.

Potential PR problem

This is not a technical issue and I’m somewhat hesitant about bringing it up. But in light of recent developments, I’m beginning to have misgivings about this aspect and I think it deserves some attention.

Namely, Nock was initially designed by, and still remains broadly associated with, an extremely controversial alt-right figure: Curtis Yarvin. Personally, I’m not much bothered by this fact alone – technology should be evaluated on its own merits and I don’t see its independent use as endorsing the personal views of whoever initially created it. But apparently not all of society sees it this way. This wouldn’t be a problem if Yarvin remained some unknown obscure blogger, but he seems to be gaining mainstream recognizability (vide his interview in NYT; recently I’ve been reading about him in European newspapers). If he becomes a widely known polarizing public figure, on the level of Bannon or Yiannopoulos, and Anoma gains broader popularity with Nock advertised as its underlying VM language, it seems inevitable that for some people the first reaction will be “Oh, so you’re using Yarvin’s language inside Anoma and promoting it?”. I don’t want to be drawn into this kind of shitshow, and presumably neither does Anoma.

My point here is not even about the specifics of Yarvin’s politics, but about avoiding being drawn into any kind of polarizing political controversy as a result of professional activities. The issue is the possibility of a future controversy, regardless of the specifics of it.

Nothing very special

I would at least suggest rebranding and “mainstreaming” Nock. Sticking with the Nock brand would be more justified if it was some ground-breaking technology, but it isn’t. It is just a purely functional call-by-value homoiconic Lisp-like language without binder names. Beyond obscure terminology and idiosyncratic design, there isn’t much very special about it. There is no significant ecosystem around it either. There is no native compiler for it, let alone an incremental or just-in-time one. The only existing potentially semi-efficient Nock interpreter is a part of Urbit. Other available interpreters seem to be more-or-less naive, often without jetting.

One could equally well choose some purely functional Scheme-like language analogous to Nock (easing transition), but with more mainstream Scheme/Lisp terminology and design choices, giving credit to Nock where there actually is some useful novelty. I’m not sure where that is (maybe jetting or subject-orientation), which doesn’t mean there absolutely isn’t – it’s just never adequately explained.

Sock

  • “Schemish Nock”

The presentation below is only a sketch, with some aspects purposefully left under-defined. The point is: you can create a language that fulfills the desired criteria (as I understand them), but is more standard and understandable and better suited as both compilation source and target.

The basic data structure is the S-expression. An S-expression is either an atom (number, binary blob, …) or a cell (t1 . t2). We use a for atoms, n, k for numeric atoms, s, t for arbitrary S-expressions. As usual in Lisp, by e.g. (t1 t2 t3) we denote the nil-terminated list (t1 . (t2 . (t3 . nil))). The atom nil can be just 0, or one can decide that it’s something else.

Wherever possible, lists should be optimized to fixed-size arrays or continuous stacks by the compiler/VM, but we use lists for standardness and simplicity and uniformity and because we don’t care about performance too much. This intended optimization justifies having list indexing in primitive operations.

The evaluation relation has the form s |- t => t' meaning: t evaluates with subject s to result t' . Renaming “environment” to “subject” is actually one of the few reasonable naming choices in Nock/Urbit. The subject is often used as a single object rather than as an environment, it can be reified, and “subject” already has an intuitive CS meaning (in contrast to, e.g., “gate”, “door”, or “arm”).

When t = (t0 . (..(tn . rest))..), we use the notation t[n] = tn (nth element, zero-based), t[n..] = (tn . rest) (elements from nth on) and t{tn'}[n] = (t0 . (..(tn-1 . (tn' . rest)))..) (replacing nth element).

Basic operations:

  • atoms evaluate to themselves: s |- a => a
  • cons creates a cell:
    • s |- (cons t1 t2) => (t1' . t2') if s |- t1 => t1' and s |- t2 => t2'
  • cell? checks if the argument is a cell (0 is true – Lisp standard):
    • s |- (cell? t) => 0 if s |- t => (t1 . t2)
    • s |- (cell? t) => 1 if s |- t => a
  • switch chooses a branch based on a numeric argument value:
    • s |- (switch t (t0 .. tn)) => tk' if s |- t => k and 0 <= k < n and s |- tk => tk'
    • s |- (switch t (t0 .. tn)) => tn' if s |- t => k and k >= n and s |- tn => tn'
  • quote delays evaluation:
    • s |- (quote t) => t
  • eval-with evaluates an S-expression with a given subject:
    • s |- (eval-with s' t) => t'' if s |- s' => s'' , s |- t => t' and s'' |- t' => t''
  • drop returns the subject with first n elements dropped:
    • s |- (drop n) => s[n..]
  • get accesses the nth element of the subject:
    • s |- (get n) => s[n]
  • put replaces the nth element in a given list:
    • s |- (put n t1 t2) => t2'{t1'}[n] if s |- t1 => t1' and s |- t2 => t2'
  • oper performs a basic operation on atoms:
    • s |- (oper op (t1 .. tn)) => a if s |- tk => ak for k=1..n and op(a1, .., an) = a
    • op is one of a few fixed operations on atoms
  • hint signifies a hint:
    • s |- (hint h t) => t' if s |- t => t'
    • h is the hint, left under-defined

Abbreviations:

  • (self) is (drop 0)
  • (head) is (get 0)
  • (tail) is (drop 1)
  • (tuple t1 .. tn) is (cons t1 (.. (cons tn-1 tn))..)
  • (list t1 .. tn) is (tuple t1 .. tn nil)
  • (if t t1 t2) is (switch t (t1 t2))

Optional operators:

  • These can be defined using basic operators, but are good to have as primitives anyway (if only to avoid overusing eval-with when it’s not necessary).
  • with changes the subject locally:
    • s |- (with t1 t2) => t2' if s |- t1 => t1' and t1' |- t2 => t2'
    • (with t1 t2) is equivalent to (eval-with t1 (quote t2))
  • push locally pushes a value onto the subject:
    • s |- (push t1 t2) => t2' if s |- t1 => t1' and (t1' . s) |- t2 => t2'
    • (push t1 t2) is equivalent to (with (cons t1 (self)) t2)
  • call-with calls a function with a given subject and a fixed number of arguments:
    • s |- (call-with s' f (t1 .. tn)) => t' if s |- s' => s'', s'' |- f => f', s |- tk => tk' for k=1..n, and (tn' . (.. (t1' . s''))..) |- f' => t'
    • The reverse order of arguments when evaluating tf' makes sense with curried functions (see “closure operators” below).
    • (call-with s' f (t1 .. tn)) is equivalent to
(push s'
	(eval-with
		(tuple
			(with (tail) tn)
			..
			(with (tail) t1)
			(head))
		(with (head) f)))
  • bcall calls a builtin with a fixed number of arguments:
    • s |- (bcall b (t1 .. tn)) => t if s |- tk => tk' for k=1..n and Builtin(b)(t1', .., tn') = t.
    • Builtins differ from basic atomic operations with oper in that they are not fixed by the language definition, but can be registered by the runtime. They can be implemented as jets or something else.
    • b identifies the builtin and is left purposefully under-specified.

Closure operators:

  • A closure is represented by <c, n, s> where c is the function code, n is the number of expected arguments, and s is the subject. This can be interpreted as (c n s) to make closures inspectable, or as a separate term form to make them opaque. The arguments come at the front of the subject in reverse order.
  • If closures are interpreted as inspectable, all closure operators can be implemented using other operators. For compilation of curried functional languages, it still makes sense to have them as primitives or builtins (either calloc and capply, or calloc, cargsnum, ccall and cextend).
  • calloc allocates a closure:
    • s |- (calloc s1 tf (t1 .. tn)) => <c, n, s3> if s |- s1 => s2, s2 |- tf => c, s |- tk => tk' for k=1..n, and s3 = (t1' . (.. (tn' . s2))..)
  • cargsnum returns the number of arguments a closure expects:
    • s |- (cargsnum t) => n if s |- t => <c, n, s'>
  • ccall calls a closure with a fixed number of arguments:
    • s |- (ccall t (t1 .. tn)) => t' if s |- t => <c, n, s'>, s |- tk => tk' for k=1..n, and (tn' . (.. (t1' . s'))..) |- c => t'
  • cextend extends a closure with a fixed number of arguments:
    • s |- (cextend t (t1 .. tn)) => <c, m - n, s''> if s |- t => <c, m, s'>, m > n, s |- tk => tk' for k=1..n, and s'' = (tn' . (.. (t1' . s'))..)
  • capply calls or extends a closure with a fixed number of arguments:
    • s |- (capply t (t1 .. tn)) => t' if s |- t => <c, n, s'>, s |- tk => tk' for k=1..n, and (tn' . (.. (t1' . s'))..) |- c => t'
    • s |- (capply t (t1 .. tn)) => <c, m - n, s''> if s |- t => <c, m, s'>, m > n, s |- tk => tk' for k=1..n, and s'' = (tn' . (.. (t1' . s'))..)
    • s |- (capply t (t1 .. tn)) => t'' if s |- t => <c, m, s'>, m < n, s |- tk => tk' for k=1..m, (tn' . (.. (t1' . s'))..) |- c => t' and s |- (capply (quote t') (tm+1 .. tn)) => t''
    • capply can be implemented with cargsnum, ccall and cextend. That’s what the Juvix compiler currently does at the JuvixTree level.
  • (optional) reify returns an inspectable representation of a closure:
    • s |- (reify t) => (c n s) if s |- t => <c, n, s>
    • If we really want closures to be inspectable, then for ease of analysis it may be better to achieve this with explicit reify than to directly represent them as white-box S-expressions.

Execution hints:

  • (hint (enter n) t) indicates that t will push at most n elements onto the subject. This essentially indicates the number of local variables and helps the interpreter to convert function frames into fixed-size arrays.
11 Likes

The link to https://github.com/anoma/nock-benchmarks is not working for me

It may be relevant to note that - as far as I understand - there has been only one public document regarding the selection of the VM language, namely this thread.

It is indeed worth noting that the requirements are fairly general, i.e. there are many languages that fit. Some of the central ones I may post here for visibility[1]:

  • Deterministic
  • Support concurrency
  • Support precise metering of computations
  • Support some form of “reads”

Given all this, it may be worth indeed discussing the language choice once again before the sunk cost becomes even more sunky and costy.


  1. May also be a good opportunity to reflect whether the current NockmaVM roadmap supports all requirements previously made or generally whether it should ↩︎

6 Likes

I forgot to set the permissions. It should work now for anyone in Anoma or Heliax github groups.

Support precise metering of computations

I think the measuring itself is not a problem. One can just have a global variable in the VM or compiled code to count the gas, which is increased a fixed specified amount on each VM instruction execution (depending on the instruction). One can insert checks for running out of gas on basic block boundaries. If one manages to induce the compiler to put the gas variable into a CPU register and avoid branch misprediction on checks, then the metering is almost free.

The interesting part is assigning costs to instructions / operations in a way that somehow reflects the time it actually takes to execute them. Nock as it stands doesn’t make it easier, because of the mismatch I mentioned between Nock operations and what one can easily implement on typical hardware.

Well, it gets more complicated with concurrency, when one VM program can be executed in multiple threads with common gas accounting, if that’s what we want.