Current State of References

ArtemG · January 27, 2025, 2:03pm

Per @cwgoes request, I am writing a quick summary of what the current (partial) solution to references is.

Prior to that, let me write two specific points that references are solving in the current iteration:

They allow for lookups, replacing the usual indexer functionality with calls such as “give me the value of key K” and filter it apporpriately which can now be specified at the Nockma level.
They allow to minimize space of files being sent to the Node over the network. Namely instead of sending a term term possibly fully compiled you can send a reference to it as a 12 call. E.g. instead of sending the whole standard library, Juvix can just reference it saving large amount of space for networking.

Now, with regards to how this proposal impacts the current Juvix/Node codebase, I would say: very little. No types need changing and little logic to be added.

With that said, here is the (simplified for readability purposes) workflow:

When a Juvix user wants to reference term term : A they instantiate it by calling anomaGet (sha256 (jam termNoun)) : A where termNoun is the term representation in Nock.
anomaGet evaluates to a Nockma 12 call roughly of form [12 [1 type] 1 key_to_read] (sic) this also has ID data for pure reads and minor details
If evaluated offline, i.e. in the client, this has the following semantics:
- Look for key key_to_read in local cache at specified time.
- If has value, produce said value.
- If not, launch a read-only transaction to specified Node ID at most recent Node time
- Produce the replied value or crash if absent (this is pending a Maybe-like fix for Juvix)
- This allows for lookups for local actions and gets rid of necessity to have most of the current indexer as implemented for 0.1 devnet.
After calling prove a transaction candidate gets produced and sent to the Node, possibly with unevaluated 12 calls, i.e. with references instead of actual data, saving space, as mentioned prior.
The transaction candidate gets a worker to be processed, a NockmaVM gets launched, evaluating the send candidate.
This evaluated any 12 calls yet unevaluated and produces a transaction datatype instance without references.
We call verify on the produced instance.
If all checks pass, i.e. state change is enabled by the RM, we then store whatever blobs are given in the transaction structure with the indication that they should be stored

Now, I should notice that this actually does not solve a problem of nested references. I am still unsure how to solve it as we current just read the appropriate hash from storage. However hopefully this becomes clearer once we test this system out.

Tagging @paulcadman @Lukasz @ray @mariari hoping that they correct any possible mistakes I made in the above presentation and present their concerns if there are any

cwgoes · January 29, 2025, 4:08am

Specifically, here, the keys are something like succinct binding commitments, right (e.g. a hash of the value referenced)? We should be careful to distinguish these from some other kind of key-value store (which exists at least in the executor engine). In general I think it’s better to avoid overuse of the word “key” which is already terribly overloaded.

I don’t quite follow this. If a user already has termNoun, why would they need to call anomaGet? Are you talking about how to compute a reference (from a known piece of data), or how to fetch a piece of data from a known reference?

This seems destined to fail if evaluated offline, I’m not quite following here. Is this instead part of online evaluation?

This evaluation semantics also implies an intended safety-like behavioral property, which I understand as something like “if a reference read returns of key k returns value v at time t_n, at all times t_{n+k} a reference read of key k will either return v or crash”, and probably a further liveness-like property that – under some assumptions of chosen storage providers behaving appropriately – the reference read at t_{n+k} will in fact return v. Is this also what you have in mind? It would be good to spell this out in more detail.

I don’t understand, where would nested references be located? If we fully evaluate the transaction function and compute a Transaction structure (as per the RM definition), all references should be evaluated during transaction function execution. What case are you thinking of here? Are you imagining references in resource logic data or something like that? If so I think we should just prohibit this for now (i.e. the RL verification would fail).

ArtemG · January 29, 2025, 11:28am

For blob storage, yes. The value is content-adressed. Namely, the binding commitment is computed as a sha256 of the jammed noun, as described in the post further down. But in general, this need not be the case as I will elaborate in point below.

I do not think that it makes sense to do so for the purposes of scrying. Via the RM specification, scry allows for storage lookups. We can limit that only for blob storage but why would we need to do so? There are also key(space)s as specified by the RM which are “special”, like anoma/transparent/commitments which are currently being fetched by the indexer but can instead be programmable in Juvix. That is, we can fetch things like commitments, nullifiers, roots etc straight in a Juvix program instead of somehow barricading it inside some Elixir logic. So I see no reason why we would need to limit this to only blob-storage lookups as this is a) good for the workflow b) easy to do

Also, I use key-value store in the sense of the Storage engine here which corresponds to timestamped key-value store on the Node side in Anoma.Node.Transaction.Storage and locally as Anoma.Client.Storage. I have no idea what an executor key-value store is.

As mentioned in point 1 yes, that is exactly what I am talking about. If you want to compute a reference for blob storage, you jam the noun and then sha256 it. Otherwise, if you just know your reference hash (e.g. it was gossiped to you somehow), just place it there into the anomaGet argument slot.

Of course for special lookups like commitments etc that we do not currently treat this is different.

If you mean by that “fully offline”, i.e. you have no connection to any Node then yes. Remember, this is specifically the case when you don’t have the needed reference inside local cache. So of course if you don’t have something locally and you have nowhere to get it from (i.e. no Node online to fetch it from) you just crash. Is there a problem with that?

Otherwise, if you have connection to some Node, you send a read-only request, it produces the value and the value is sent to the Client putting it into the 12 eval slot. Wanted to make sure to elaborate on this once again. You can check out the example of this here.

Yes. However this is just part of 12 semantics, so I am unsure there is anything to be specified at the lever we are talking about.

I think here I may not have been explicit fully. Suppose for some reason we e.g. stored some special thingy, like the Anoma standard library lib. It lays in storage under the hash sha256(jam(lib)). Now suppose I want to store some core, I dunno, maybe some custom gate with logic logic I made that I want to be avaliable to some friends and I want to use the standad library. Now, there are two ways of storing this. Either as a core [logic sample lib] which is large or with a reference to the library [logic sample sha256(jam(lib))] which is usually much smaller. So if we want to save space, we want to consider this second variant of storing stuff.

I should say that I don’t think any problem will arise during the dereferencing phase, i.e. during transaction function evaluation as 12 call has a type field that can help the user here. However, there is a problem that [logic sample lib] and [logic sample sha256(jam(lib))] have different keys that they are stored under as their jam differs. That’s the stuff that I don’t know how to handle yet.

I agree though that for now let’s just assume everything is fully dereferenced in storage.

cwgoes · January 30, 2025, 6:17am

Ah, I agree that we should expose everything via the scry interface - I just think we should clearly distinguish between the case where the key is a succinct binding commitment to the value and where it is not, because these have different trust assumptions (since in the former case, we can immediately verify that the retrieved value is correct).

Yes, I see. I think we’re using the terminology slightly differently. To me a “reference” is the hash (or, more generally, succinct binding commitment), there is no separate “reference hash” (which doesn’t make sense IMO because the hash is the hash of the data/value, which we definitely wouldn’t call “the reference”).

Not necessarily, but you started the list with “If evaluated offline”, so I wanted to check.

Ah, interesting. Can I find Nock 12 semantics specified somewhere? I looked here but this only has up to instruction 11. @AHart also wrote something here which makes sense to me operationally but doesn’t say anything about expected or guaranteed properties. It seems to me like succinct binding commitments are going to have different trust assumptions (safety is easy to guarantee) than other kinds of lookups and we’ll need to clearly delineate that. Other kinds of lookups would have trust assumptions based on which node you contact - so I expect we’ll need some sort of “trust-assumption-aware” scrying if we want to support this as an option in transaction function processing.

That makes sense, although wouldn’t we also want some flag that indicates that sha256(jam(lib)) should be interpreted as a reference?

I see. Is this an operational problem, or would it just potentially make the caching less efficient (since nested references vs just data would be cached differently)? In principle it seems to me like we could design some alternative version of jam which always uses references and use this for the cache.

ArtemG · January 30, 2025, 12:06pm

I would say scrying will probably change a bit when we have multiple nodes to contact. Probably for referencial transparency we need to explicitly state which Nodes we contact. Otherwise we lose purity. @ray tagging you for reference

But generally, IIRC the requirement for Nockma generally as a Juvix backend is referencial transparency, so stuff like “evaluation of the same Nock call returns the same value or hangs forever always” needs to be always guaranteed if I understand correctly.

Re semantics specification the best place to look is the RM report itself.

That is possible, but will screw with our typing. Pending investigation.

The latter, so I wouldn’t worry about it much for now. It is however something to think about in the future.

ray · January 30, 2025, 1:20pm

Content-addressed blobs don’t have normal trust domains (we move all the trust onto trusting the hash) which is why they have especially simple scries; you passed a hash, you got back something whose digest is that hash, that settles the matter.

ArtemG · January 30, 2025, 1:21pm

Here we were talking about not just blobs but things like commitments, nullifiers, roots, etc. All the “special” namespaces reserved by the RMs.

Topic		Replies	Views
Client-side RM execution Resource Machine Stuff	8	91	November 8, 2024
Storage and deletion criteria questions Resource Machine Stuff	6	71	February 27, 2025
Stored data format: resource machine <> storage Resource Machine Stuff	21	464	April 3, 2024
Simple query interface Protocol Design	18	69	September 13, 2024
Keyspaces and their owners Protocol Design architecture	5	64	May 23, 2025

Current State of References

Related topics