Prompted by a discussion with @mariari concerning the devnet v0.2 product requirements, I thought it could be useful to write up my understanding of what users need to be able to do client-side with the resource machine (for the immediate future).
Let’s start with a few definitions:
A client is a lightweight node running on an edge device, such as the user’s phone or browser, which is not always online and has limited compute, bandwidth, and storage resources.
In the immediate future, I think that it’s important that users are able to:
Run transaction creation functions (written in Juvix) locally, which should be able to:
a. Access state local to the client device (e.g. request signatures and view shielded notes)
b. Access state not necessarily stored by the client device (e.g. the data referenced by some particular resources as known to some controller).
Simulate transaction functions as they would execute on a controller post-ordering, to the best of the client’s knowledge (e.g. using currently known state, which might not be the state post-ordering).
In terms of RM execution support, both (1) and (2) require the ability to cleverly interpret state reads (SCRY calls in Anockma), namely:
Where the data queried is known to the client: return the data immediately.
Where the data queried is not known to the client: pause the VM, query it from a remote node, cache it locally, and return the data.
Periodically prune the local cache according to some target maximum storage use (LRU).
This is the same as the way the basic distributed cache works, just where the data which it is desirable to cache is determined by what kinds of transactions the user is crafting (which is yet another reason that - at least in the long term - the client should be a configuration of the node, not a separate piece of software).
(1) also requires:
Some special builtins - “sign” or equivalent - which redirect signature creation requests to the identity machine running locally (which actually stores, or has handles to, the user’s private key material).
Interface work to make crafting transactions involving shielded resources (visible only to the user with their viewing keys) easy. In general, decryption of this state should be handled outside the VM environment, and the VM environment running on a client should be able to query and view any shielded resources which the user has viewing keys for. We should write this in a way which will be compatible with future private state query techniques.
(2) also requires simulating the environment in which the controller would execute the transaction function, including possible timestamps, the identity of the controller (if this is available to transaction functions to read), etc. I don’t expect this to be particularly difficult.
As far as I am aware, this will be sufficient for the immediate future. Feedback welcome!
Hello everyone I want to post Engineering’s thoughts on what a client is and what client side proving ought to be for both the v0.1 devnet and the v0.2 devnet.
A Bit About Clients
I want to not use the word client to describe “lightweight nodes”, as then we should just call them a node and this is an example of a kind of node.
I want to instead use client as a useful term that we can use to distinguish parts of an Anoma implementation. I believe if we don’t do this, then we will mix up the transaction life cycle with many components, which I believe has led to parts like the Identity aspect of Anoma to go neglect in the life cycle of the Anoma system.
For this, I’d like to define the following terms:
The Anoma node is the replicated transaction subsystem of Anoma.
This includes things like: consensus, transaction candidates, ZKVM verifiers.
The Anoma client is a full environment for operating around the Anoma Node.
This includes things like: Anockma, ZKVM, local storage, ZKVM prove, etc.
These definitions should be iterated upon, but they should give a feeling for how these components interoperate.
A good diagram can be found from our client vs node architecture meeting.
Here we lay out what component belongs where, and how they interact with the system. Something important to note, is that under this paradigm, it is very obvious where we can add Identities and where they belong in the transaction life cycle. Namely that the client workflow mostly comes in the form in of trying to create transactions to send to the node, with a different set of concerns and data available.
A Real Plan for References in Anoma
References show up quite a bit in the Anoma (label, logic proofs, value, and found in Action’s app-data).
However, what work would it take to actually support references?
@cwgoes in the OP points out various requirements, however I don’t believe it accurately accounts for the work that is required to support this feature.
In Engineering we made a Wardley map of what is required to get a referenced resource machine (RRM).
Values to the left are less specified and more experimental, with most of the features here being in their genesis phase of research and development.
So if we traverse this map we notice there are a few key areas of development that needs to happen:
We need the construction of a client side vm.
We need to specify how and when dereferencing happens (hence dereferecning protocol).
For now us let focus on the first chunk of nodes. These nodes deal with the Client Side VM.
The Client Side VM, is simply an execution environment for the nock function to run. This environment has to mainly give Client Side Scry to any given Nock code.
This part is important, how do we even get Client side scrying? Well we have to first develop Client Storage, this is smart storage that can hold a few different things:
Hashed Blobs
Identities (This is the user’s identities, we don’t want to gossip these around on the node side)
Random User data (The client can act as a sort of “wallet”, further private local code can be stored here if so desired).
Further we can imagine a scenario where the user has references to data they don’t have locally to handle these scenarios properly, we must have read only transactions. That is to say transactions that bypass ordering and do not mutate state on the node to properly sync currently unknown data.
However, the Client Side VM is only half of the equation, we need to think about when references get dereferenced from the code and VM (both client and node) side of things.
Namely, should we have Manually Dereferenced or Automatically Deferenced data. Most GCd languages just have automatic dereferences with some ability to peek under the hood at references. However what is wanted I think is more dependent upon the programming language model we want. I suspect the answers for Juvix and AL may diverge. However what won’t change is how do we handle accessing data and when, and if the data we send to the node should still contain references in them.
An aside: (further, it’s customary for referenced types to keep their types thus if the label of a resource is of type X, a reference to X is Reference X not now a Field element.).
References can not exist by the time one calls the prove function in the ZK case, as you need all data upfront, and thus the system should do zero fetching by submission time, however for the “transparent” case we have a bit more flexibility, however we should not let this fact blind us from the amount of work that it takes to properly deal with these questions.
On not what having references gets us.
Now let us imagine a world where we did not have references in the resource machine. In this case, what changes?
I’d argue not much.
The interface to the resource machine may not have to change much at all! Namely we can fake references in Juvix, and thus the user would form the transactions as they do now, just with fake references! We can already fetch data, just via the indexer which is what we’d have to do if no client side storage is available.
On the Modest Proposal for Preventing the References of Ill Conceived Plans from Being a Bother to The Applications or Anoma, and for Making them Beneficial to the Public.
Now, since we are talking about what we can do for the devnet, let us consider a proposal that was proposed by @ArtemG and accepted as somewhat uncontroversial:
References are used for/in label, logic proofs, value, and ((app_data (?))) [let’s get a full list of things to be referenced]
The value of these references are kept in two possible places:
app_data in format {data, keep} or {data, discard}
Storage under the key sha_256(data)
When a reference is to be accessed, it can hence be accessed either from
a. app_data
b. scried from storage (this can only happen inside of a TX during execution or through indexing)
If a transaction passes verification checks, etc then we take all elements of form {data, :keep} in the app_data field and write them at the timestamp of the transaction with key sha_256(data)
On Refusing the Modest Proposal
Let us reflect on how this works in practice, given none of the work in the Wardley map is currently done, and we wish for proper error messages for the user.
Imagine the below code (written in Elixir) is Juvix code.
# First we consturct our transaction, we already Indexed
# Note: this requires separate compilation
def main1(resources, transactions, etc...) do
createMyTransaction(resource, transactions, etc...)
end
# After we create our initial Transaction, called myTransaction
# from now on, we have to get all the references from the indexer...
# (Defeats the point of refs anyways, as we have to grab it all, always)
# Meaning we call out to the JS or Elixir to then scry/index it all.
def main2() do
grabAllReferences(myTransaction)
end
# After we ask the indexer for all the hashes,
# we now put together our transaction to see if it proves
# Now we can submit this online.
# We could also submit `MyTransaction` as well, since
# transparent proofs aren't real, and the node will have to deref itself
# But we do this so we know our proof is true.
# Note: this requires separate compilation
def main3(references, public, private) do
construct_fullyderefed_tx(myTransaction, references)
|> prove_transaction(public, private)
end
@spec withLookedupData([any()]) :: %{hash => any}
def withLookedupData(answers) do
myTransaction
|> grabAllReferences
# This also takes myTransaction, as it could have answers to refs
|> correlateDataWithHashes(myTransaction)
end
# Note that in our prove context we either need to work on a transaction
# With a context I.E. the answers always, or replace the refs
# We can't actually replace the refs because Juvix is typed
# it will complain.....
# YOU WILL NEED MONADS TO PASS THIS AROUND TO CHECK oOoOoOoO
# State Monad Because + Random you want
# This all has to casted in Juvix
@spec construct_fullderefed_tx(Tx.t(), %{hash => any}) :: {Tx.t(), %{hash => any})
def construct_fullyderefed_tx(trans, answers) do
...
end
def grabAllReferences(transaction) do
[]
|> addReferencedLabels transaction
|> addReferencedData transaction
|> addReferencedLogicProof transaction
|> addReferencedAppData transaction
end
This computation is done in 3 stages
We compute our Transaction, The queried data may contain refs. This is one compilation in Juvix.
We then get all our refs in Juvix, to then send to the Indexer. This requires some JS or Elixir in-between to smooth it over. Note that data we got from the indexer may have references, so we’d have to query again and again until no references are left.
Now we have to compile Juvix for the second time to now correlate the transaction with references to their data. We can do this via a monad that all user code must run in to assure correct computation.
At this point we need to submit the transaction with it’s environment, as the juvix code does not reference 12 while running, meaning it must submit back all the data it gathered either way.
Note that we have to completely dereference the transaction code during proving, if we want to be able to test the transaction offline (ensuring verification will pass, etc.).
These problems arise on the Juvix side since they need to do some work to prove offline.
An Engineering view of user flow
Further while having these discussions, we’ve made a Wardly map covering on what we think user flow requires:
Thanks for the response here. References deserve a comprehensive treatment – I first want to clarify the client/node distinction which seems like a necessary first step in the discussion here.
I think I get a rough sense of what distinction you’re going for here, I agree that such a distinction would be useful, and I’m happy to consider using the word “client” in roughly the way that you suggest. However, I want to clarify a few points here to make sure that we’re on the same page:
Where would functions like P2P, distributed (synchronized) storage, and solving live?
To me the crucial distinction is not really one of functions - e.g. Anockma - which are abstractions that could be used anywhere - but rather one of state, where:
a. The node would deal with state which is, at least possibly, distributed, and all operations which involves network operations, and
b. The client would deal with state which is always local and operations on that local state which do not involve network operations (but might produce things like transaction functions or objects which are later sent over the network).
Is this distinction of state aligned with your intuition here, or not?
Well, it depends precisely. If you were to ask me about say an “indexer” I’d say outside of the system, however if it was rephrased to being about the “RO transaction” then I’d say inside the node itself.
Distributed storage is part of the “Node Storage” There is likewise a Local Cache that can automatically sync that to the client, which belongs more in the Client.
Solvers are also client side (if we had to pick node vs client), they aren’t part of the distributed database and are outside the system and I’d argue potentially interact with the client and the node.
Note that this would be apart of the same OS, just not in either of these components I’d argue.
Yes I believe what you said is accurate to what my intuition says.
There is a caveat on b. in that the client may invoke some functionality that goes online like a RO transaction to populate the cache
Hmm, thanks. I’m still not quite convinced that we have a 100% conceptually clear delineation here. Do you imagine that each engine (or Elixir actor) is either part of the node or the client, but not both? Or could an engine be part of both in the context of processing different messages? Operationally, how would this division work?
A particular Engine could be used in both, nothing prevents spawning an actor from some node code in the client (in fact this is trivial in Elixir), however I’m unsure of a case in which the actor would be exactly the same in both. There are shared parts, for example our nock-vm is practically the same code in the client and the node but we pass a different scry function in the near future to handle client side scrying. There will be likely other changes in the future, but the code is rather general as to avoid any extra work due to this distinction.
If the actor processed different messages depending if it’s in the node or client, I’d argue it’s a different actor. I can imagine in some very specific scenarios that they can be the same.
Let us imagine an actor A such that, for the node and client it stores the same state, however the interface to the same state should different. Then I’d say we have a few options on how to proceed:
Make them the same actor, and use a library that limits who can send what message to A (@ray mentioned a library that would let us build walls around our actors message wise).
This has the downside of the Client must be with a node, and thus we can’t always do this (we can have fall back behaviour to program around this, it’s not too big of a deal, just details need to be thought of here).
Modify 1. in that the client spawns the same actor as the node but has the API wall set, and they both have different ones
This has the downside that they have different states and so does not satisfy this weird case.
We do 2. but we do state synching as well as an optimization when a client knows of a “node it trusts” (Same erlang node).
Hmm, I’m still not quite convinced that this distinction is clear, although I agree that there is some distinction. It seems to me like what we want, at a higher level, is to have the ability to separately decide:
Where, as in on which agent, code is actually executed, and
Whose view of the system, at what time, are we simulating in that execution, and
What information should they be able to access.
Here are a few examples of different combinations:
I might want to execute some code in the “OS environment” (for now, basically Anockma as in the RM, with some details around state access), in a local environment that has access to my local state (such as the ability to sign with my keys), and a view of the distributed state that is the same as a controller’s declared view at a particular block height.
I might want to execute some code in the “OS environment” without any access to my local state (for example, if I’m replicating the execution originally performed by a controller).
I might want to execute some code in the “OS environment”, composing the perspectives of several known controllers, in order to simulate how a multi-transaction interaction might go.
I might want to execute some code in the “OS environment”, in a local environment with access to only some of my local state (maybe because I don’t have full trust in the code that I’m executing, and want to carefully sandbox whatever it has access to).
More combinations are possible, in principle an infinite number considering the potentially unbounded number of controllers and possible distinctions in local state access permissions. I think it should be possible to define a rough mathematical model of what we’re actually trying to do here, related to views of distributed state, but before attempting that myself, I want to pull in @isheff and @Jamie to this discussion as I think they might have valuable input.
Thank you for pointing me to this thread @cwgoes. What jumps out at me is this interesting list of parameters:
Where, as in on which agent, code is actually executed, and
Whose view of the system, at what time, are we simulating in that execution, and
What information should they be able to access.
The “where” and “when” parameters seem clear. It’s not clear to me how “whose view of the system” and “what information [we] can access” are distinct — surely the two are nearly the same thing, in the sense that what information we can access determines whose view of the system we have, and vice versa.
Comment 1
I can make a superficial comment: the logic and model that I built to study Heterogeneous Paxos has contexts of the form (n,p,O,\varsigma) where n is time, p is place, and O is an open set which is not identical to the “whose view of the system” parameter above, but which does resemble it (\varsigma is an interpretation which for this conversation we can treat as fixed). Dynamic modalities are available for all parameters — i.e. we can move around in time, space, and to change the open set.
Message logic is simpler. It just has time n, space p, and \varsigma, with corresponding dynamic elements as you observe.
The toy model of intents has no explicit time and it has space. But it’s much more precise about what data is: data is linear resources in the sense of Linear Logic, which can be used to represent predicates, resources, and abilities (see Comment 2 below).
I like your list because it suggests a common structure, that it might be interesting to tease out: we have a point in time n and in space p, and a view from there (an O or similar), and then some stuff we have or stuff we can do (linear resources).
Comment 2
“Where” and “when” are clearly modal parameters. Fine; what about “whose view of the system” and “what information we can access”?
Only example 1 in your post seems to clearly distinguish between “whose view of the system” and “what information we can access” (namely: “access to my local state, including the ability to sign with my keys”, and “controller’s declared view at a particular [logical time]”). To me, this hints that “view of the system” may correspond to a resource that unlocks certain actions rather than to a modal parameter. But I’d need to see more examples.
At times there will be a distinction such as: “I want to run code in an environment that can access my local state, but only part of it”, where “my local state” is understood as, say, the state known to one operational instance of the protocol running on one physical computer. It seems possible to me that with an appropriate notion of identity we could collapse these two as you observe.