On simple model for the Anoma specification

Preamble

As of today the Anoma specification is on version v0.1.3, and although we’ve made great strides in making the specification readable I believe we can once again greatly improve the situation by simplifying the theoretical framing we use on the specs.

The Anoma specification (hereby referred to as specs) currently aims to serve two purposes. The main purpose is to prop up engineering in their endeavor to implement Anoma. The second purpose is to serve as a base to do proofs about theoretical properties of Anoma.

However, I believe that the current specs has aimed to satisfy the second purpose at a higher priority to avoid locking themselves out of the path at a later date, this is all well and good however I believe our ideas on Anoma itself are still immature and prioritizing such a goal has made the specs lackluster at serving it’s primary focus and has made it paradoxically harder to achieve the second goal in the long run.

Theoretically Framing of the Specs

Before we suggest a new way the specs should be formed let us familiarize ourselves with the framework the specs use to explore Anoma.

The model is an actor models, with our specific actors called Engines, here are the types in Juvix.

type Engine (C S B H A AM AC AE : Type) :=
  mkEngine@{
    cfg : EngineCfg C;             // Static Information
    env : EngineEnv S B H AM;     //  Dynamic Information
    behaviour : EngineBehaviour C S B H A AM AC AE;
  };

type EngineCfg (C : Type) :=
  mkEngineCfg@{
    node : NodeID;
    name : EngineName;
    cfg : C;
  };

type EngineEnv (S B H AM : Type) :=
  mkEngineEnv@{
    localState : S;
    mailboxCluster : Map MailboxID (Mailbox B AM);
    acquaintances : Set EngineName;
    timers : List (Timer H);
  };

type EngineBehaviour (C S B H A AM AC AE : Type) :=
  mkEngineBehaviour@{guards : GuardEval C S B H A AM AC AE};

type GuardEval (C S B H A AM AC AE : Type) :=
  | First (List (Guard C S B H A AM AC AE))
  | Any (List (Guard C S B H A AM AC AE));

Guard (C S B H A AM AC AE : Type) : Type :=
  (trigger : TimestampedTrigger H AM)
    -> (cfg : EngineCfg C)
    -> (env : EngineEnv S B H AM)
    -> Option (GuardOutput C S B H A AM AC AE);

These types are what define out an Engine I did not post the GuardOutput, but this is what theoretically defines out the life cycle (we will get back to this in a bit), however for now I want to emphasize a few important details about this model:

  1. AM is the type of all Messages
  2. AC is the type of all Configurations (Static State)
  3. AE is the type of all Environments (Dynamic State)

Meaning that all messages, configurations, and environments must be known upfront and added to a sum type that all other engines must also know about. This implies that the processing of Engines and their updates will be done with all of them mixed together in some capacity (I have tried to think about how this would be done, and I can’t see it personally as the sum types aren’t there for storing said engines thus meaning it’s very confused).

To better illustrate my point, let us look at how the standard Anoma message type that all engines in the specs put for their AM generic.

type Msg :=
  | MsgIdentityManagement IdentityManagementMsg
  ...;

type IdentityManagementMsg :=
   ...
  | MsgIdentityManagementDeleteIdentityRequest RequestDeleteIdentity
  | MsgIdentityManagementDeleteIdentityResponse ResponseDeleteIdentity;

type RequestDeleteIdentity :=
  mkRequestDeleteIdentity@{
    externalIdentity : EngineID;
    backend : Backend;
  };

type ResponseDeleteIdentity :=
  mkResponseDeleteIdentity@{
    err : Option String;
  };

With this we can see a few things:

  1. Wee must have 2 layers of constructor wrappers, the first to uniquely identify what message it is and then a second identifying what engine should(???) process this message
    • This gets complicated actually as the common Response data types should be processed by the actor that requested it, so this is very confusing for me for trying to determine a life cycle.
  2. We must have 2 types, one for the message request then another for the response.

The issue with 1. is that the response has no easy way for me to determine who to send it to properly, and the problem with 2. is that now where do we put the documentation about the semantics of a message? Do we put it in the request or do we describe it in the response.

Further since this is all convention it’s not always request, response, sometimes it’s request, reply. Which means that the specs authors must be careful and keep this in mind when writing specs for their Engine.

Lastly this complicates the sidebar when browsing the specs


making it very cumbersome from an UI perspective.

With some idea on how the Engine life cycle works let me wrap up this section with the following comment. I believe the architecture around Engines in the specs is overly complex and it does not easily form into a cohesive model that will be useful for proving nor explaining Anoma. I will now try to explain a plan that I came up with to try satisfy both requirements. The approach I’ll take put simplicity and engineering first however from simplicity I believe we will find an easier model to prove. I will also give a full life cycle that is flawed, and I open the floor to the specs team to refine the hypothetical life cycle if it is useful for proofs or maybe use the flexibility in my model to cut complexity and model the engine architecture part separately from the actual engines!

On a simple model

In coming up with a simpler to reason about model of the specs I had two thoughts:

  1. If we defined the specs in Java, what shape would things take?
  2. Can we model our messages as functions for known documentation techniques?

In coming to these questions I came up with the following model.

for the *DeleteIdentity types above, model them as if they are a function

delete-identity : engine-id -> backend -> option string

Further for life cycle reasons we’ll want to wrap the output type in a Monad.

delete-identity : engine-id -> backend -> monad (option string)

This would look very similar to what our Elixir code does

  @spec handle_tx({Backends.backend(), Noun.t()}, binary(), t()) :: t()
  defp handle_tx(tx = {backend, code}, tx_id, state = %Mempool{}) do
    ...
    Executor.launch(node_id, tx, tx_id)
    ...

Now to properly scaffold our engine-id we should think of engine-id as an object that accepts a message like Executor that locates the Executor we care about. In concrete Juvix terms we’d define it like this:

type Engines := {
  executor : Executor
  identity : Identity
  workers : [List Workers]
  ...
  }

Thus we can actually say

some-engine-function : EngineId -> ...
  | engine-id ... :=  do
   Identity.delete engine-id
   ...

This gets rid of the concrete wrappers like Msg because instead of having a list where they all mix we have a concrete data that disambiguates all the kinds of engines. Note that this technique even works for multiple Engines in the structure, we just have to make it a list or a mapping with extra disambiguation information.

The next step is to follow in Erlang’s step and separate this nice interface from how the actors actually work.

In our Elixir code we have


  @spec tx(String.t(), {Backends.backend(), Noun.t()}, binary()) :: :ok
  def tx(node_id, tx_w_backend, id) do
    GenServer.cast(Registry.via(node_id, __MODULE__), {:tx, tx_w_backend, id})
  end

  @impl true
  def handle_cast({:tx, tx, tx_id}, state) do
    {:noreply, handle_tx(tx, tx_id, state)}
  end

Likewise, for our delete function we’d have a distinction between the interface and the actual handle code. The particular methods for handling this may vary depending on the implementation techniques chosen, so let us discuss some benefits of this method before procceeding.

This approach allows us to document delete in a single place and document it like a normal function using the normal doc tools. Further we can use familiar UX that is know to be effective for this kind of documentation, see [1][2][3][4].

defmodule Identity

--- deletes an identiy from a given backend
--- argument ...
--- returns ...
delete : EngineId -> Backend -> Monad (Option String)
 | engine-id backend := do
  ...;

Further we can cut having the AC, AE and AM generics, along with the double layer of wrappers that comes with concretely filling out these types. Also we’ve cut having to have 2 types defined for every message we’d like to define.

Now for various strategies on how delete could work:

  1. We fall back on the sum type that we saw before in IdentityManagementMsg, meaning that it looks something like this
delete : EngineId -> Backend -> Monad (Option String)
  | engine-id backend := invoke-message (MsgIdentityManagementDeleteIdentityRequest engine-id backend)

The downside of this approach is that besides the sum types coming back, the flow is very unclear still!

  1. We process things immediately with the option of removing engine scaffolding. This would look something like this
delete : EngineId -> Backend -> Option String
 | engine-id backend := handle-delete (EningeId.identity engine-id) backend

handle-delete : (State, Configuration) -> Backend -> Option String
 | (state, config) backend := ...; 

with EngineId.identity fetching the state for the real computation.

  1. Adopting a call cast system with a monad life cycle to handle this all
delete : EngineId -> Backend -> Option String
 | engine-id backend := call (EningeId.identity engine-id) (Delete backend)

This requires a greater wrapper type like we saw before in IdentityManagementMsg and in the first approach, however using it in this way gives us some advantages.

However before we continue I want to compare the approaches and their benefits and downsides before we explain how 3’s life cycle works to any degree.

I believe we can mostly throw out 1, as it has a confused life cycle and isn’t going anywhere if we wish to model a full life cycle.

Model 2 is the simplest by far as we need no monads to model this, when we “send a message” to another actor, that function is switched to right away and process happens linearly. Note under this approach we can keep the engine framework or toss it completely in this context as that just changes the “run time” of how the processing is actually done rather than a forced baked in idea.

This approach works the best if you want to model properties around message sends separately from the actual details of Engines. This gives you huge flexibility there along with making way for a trivial implementation that we can try our proofs out on.

Approach 3 makes the most sense if you want to prove the properties of engines along side specifics of particular engine executions together. This comes at a great cost however, as the engine model itself is no longer optional and we must model everything in a very specific monad. The rest of this post will cover my attempts at characterizing this monad, I hope future posters can simplify what I came up with, however I’m unsure how simple this can be made as we are effectively recreating doing what Javascript did when creating async computation and call back hell, just with better concurrency ideas.

On an engine life cycle

If we really take approach 3, then I came up with the following life cycle that ought to work but is overly complicated.

  1. Anoma is a Solid State Interpreter (SSI), meaning that we can realize Anoma in steps of computation. We apply our update function which begets messages for the next step to then process so on until the messages are empty. Each step is a monad (haskell not the mathematical kind) in which our computation will happen.
  2. On each step engines look at their mailbox to process any new messages.
  3. Engines can send two kinds of messages. 1. synchronous messages and 2. asynchronous messages. If a synchronous messages is sent, then the engine defers all control until their request is handled.
    • 2. is easy to realize as we can buildup a list of Asynchronous messages that gets fed into mailboxes on the next step.
    • 1. on the other hand is much more complicated and requires us to serialize our current function into a continuation and offer up our next step as a response to whenever the Engine finishes the computation.
  4. When a synchronous message is handled by another engine, it builds up a monad of the asynchronous actions` from it’s own processing along with applying the continuation on the return, taking a step for both itself and for the engine that it’s responsible for continuing.

To give a better feeling for this, I will give a brief example of a life cycle in action.

Imagine we have a network where engine A is processing Foo, and no other engines have any messages queued.

On step 1, all engines besides A have no messages to process so quietly return with no new actions, engine A finds it must process Foo and does so with handle-foo.

defmodule EngineA

Engine : Types := Anoma.Engine Configuration State;
type Configuraiton := ...;
type State := ...;

handle-foo : State ->  Argument1 -> ... -> LifeCycle (Output)
  | state argument1 ... := do
    Identity.delete argument1
    new-identity <- Identity.create argument2
    return 5;
end;

Upon reaching handle-foo, it’s ran within the overarching LifeCycle Monad the SSI wraps around each step.

when we compute Identity.delete it simply adds it to the Identity mailbox for computation next LifeCycle as it’s a cast, however when we see Identity.create, this is a call and thus we must stop here, adding Identity.create to the the mailbox. A returns these actions, with the rest of the logic (return 5) in a continuation in the message Identity.create. Further we set the processing logic of A to be busy as it can not currently handle any computations.

cycle 2 starts, and Identity has messages, where as everyone else either has no messages or is considered busy (A).

Identity sees the delete message, and then processes it. Let’s assume it does no further calls, and sends no messages, however this updates the state of the Identity engine to no longer have argument1.

Cycle 2 is now done, in cycle 3, Identity has a message, whereas everyone else is waiting.

Identity processes create. Like previously let us assume create only does casts not calls, meaning that all it’s processes finish. Since this is a call, the result return by the create is sent into the continuation that it was called with and A resumes the computation returning 5 with no extra affects to append.

With that there are no more messages to append and each engine successfully executed their logic.

This isn’t the most elegant way to do this, however this is the approach that works given a fully functional system with no real concepts of threads.

Conclusion

I hope this post has been helpful in understanding in what way the current Engine approach is inadequate for both explaining and proving Anoma. Further this post outlines a few techniques that let us simplify the approach without having to give up provability. Lastly the approach displayed utilizes what primitives we have in Juvix for making the specs into a full life cycle implementation utilizing a single step interpreter approach to system design.

References

[1] Juvix Documentation
[2] String (Java Platform SE 8 )
[3] sets vocabulary - Factor Documentation
[4] adjoin ( elt set -- ) - Factor Documentation

4 Likes

This proposal doesn’t make sense to me.

What do you mean by “object”? What specific features of objects are important for your proposal(s)? Are you using “object” terminology loosely here, just to suggest a record type that bundles functions along with static data?

I cannot parse this sentence. “Executor” is being used as both a proper and improper noun giving the impression that there is some specific thing called “Executor”, and, separately, there’s a kind of thing which can be “an Executor”. As such, I’m not sure if your system is supposed to have a single, uber-executor, or if you’re using executor as a generic term for an engine that executes something. What does it mean to accept “a message like [an?] Executor”? And what’s locating “the Executor we care about”? The message? The proper noun “Executor”? The engine-id? These are the only three options I can see, and none make sense to me. Are you talking about the actual Executor engine from the Ordering Component?

This is supposed to define an engine id, right? Are you conflating engines with their identities? Also, what is “Executor” supposed to be? How is it different from what engines are supposed to be? And what’s the motivation for giving engines lists of workers? Are workers not engines?

Is this trying to say that we have an engine id that must be deleted, and we just say “some identity engine is supposed to delete this” by calling Identity.delete, and a separate mechanism (maybe part of the monad) disambiguates which identity engine does the work?

Or is there no identity engine at all and Identity.delete is a function of the network called to modify the state?

Or is “Identity” supposed to be an “object”, a specific identity engine instance, and you’re calling the delete method of that specific engine?

Or is EngineId, rather than being the generic type of all EngineIds, is specifically the type of EngineIds which are identity engines, and Identity.delete is a component of the record defining EngineId, which contains a function returning an element of some monad managing (network? engine?) state?

Are you implicitly calling for the removal of identity related engines altogether and that identities are intrinsic parts of all engines without separate engines (identity management, readsfor, etc) to handle them? If that’s the case, then this isn’t completely a specs issue.

Since no state is returned, how does this change the state? All these can do is consume a state and return a string. Are you implicitly assuming that handle-delete is applied to a mutable variable whose value is mutated by calls to handle-delete? Can that be made sense of in Juvix?

And what are you specifically referring to by “engine scaffolding”?

What does this mean? Why is there a notion of “real computation” separated from… what?

At times I start thinking that these are supposed to act like elements of a free monad or something similar, but then I see they return Option String, and I can’t make heads or tails of what they could be.

Considering how little I understand so far, I don’t think it’s worth trying to comment on the life cycle monad, but I may do so later if I can get some clarity.

To clarify this a bit, there’s no way a priori to define what engine should process a message since some messages (mainly response messages) can be processed by multiple different engines in different ways. The messages could all be put in a top level file, and are only placed in engine-specific files for documentation reasons. Generally, a message will be associated with a specific engine if it is the only engine which can send that message (in which case it should be called a “Response”), or if it is the only engine which can receive the message (in which case it should be called a “Request”). That is my understanding of the ideal, at any rate.

An entity that responds to messages. Executor can be a message that identifies the Executor. Depending on how you define things you may require extra information like “the id is 5”. This can be realized in Juvix as a record type.

Let engine-id = EngineID { executor = ...; identity = ...; ....}
Let exectuor (EngineId {executor}) = executor 

The EngineID type here responds to the message executor that gets the executor from it.

Well given a node, we know these engines exist within a node to make things happen. Thus the id identifies uniquely all the singleton engines per ID.

Thus if I wanted to send a message to the mempool, then I can send something like

Mempool.submit(engine-id, my-transaction), where Mempool.submit can uniquely find the mempool since we can identifiy the mempool uniquely given the engine-id.

def submit({mempool = mem-id}, my-transaction) = ...

From here we can send a message to the mempool directory since we have mem-id bound. The specifics of this depend heavily on how you decide to implement the life cycle, however it’ll always be something like this

submit : Engines -> Transaction -> Lifecycle Submit-Output

There may exist many workers within the same “node”, and thus it’s not unique per engine-id in the same way (it’s id may be engine-id + index or maybe a map can be had as well, the details here don’t particularly matter).

I forgot to hand delete extra arguments, imagine I wrote

Identity.delete engine-id backend instead, which is the current signature that it has now in the specs.

I don’t think this can be realized in Juvix in general as identities are often associated with external resources Juvix can’t operate with. I think I was quick to dismiss having a monad for processing even with option 2. the specs hasn’t contemplated this in the life cycle and thus I assumed this particular operation was for external like identities.

Engine scaffolding is the engine framework that is currently defined

Well if we get rid of the engine scaffolding then the information an engine has mostly boils down to the dynamic and static information it has (environment/state and config), and hence we can just compute. I think I messed up not including it under a monad in option 2

1 Like

This strikes me as a fair critique – although before making substantial changes I’d want us all to be on the same page on a very clear alternative proposal which we agree is better (which I understand you’re aiming at here).

I don’t think that sum types imply that all engines must know about all message types in the implementation. There are many ways to correctly implement the specified semantics here – for example, simply identifying each message type with a unique tag, where a given engine must understand how to process message types with certain tags (a subset of the sum type). Per my understanding of the semantics here, that would be a correct implementation, and it doesn’t involve the engine needing to know about any of the other types. Perhaps this is more interpretation than engineering would prefer to have to do in implementing a spec – I can see that.

I agree that this model alone does not make the “execution semantics” (ref. your “the processing of engines and their updates will be done with all of them mixed together in some capacity”, which seems be related to your concept of “lifecycle”) clear. I believe that there has been some effort to specify what those are in the form of an “engine orchestrator”, maybe @graphomath or @jonathan would have helpful input here. In my understanding, a central aim of this engine model is to be as agnostic as possible to choices of what order to process messages for different engines in, such that (a) we facilitate maximum parallelism where available and (b) implementations have freedom here. To me, this is an important aim – what do you think?

This is not what these types are doing. Together they identify a (globally unique) message. The two layers are just a way to organize message types by engine type. Nothing here is talking about specific engines.

I don’t follow this at all, why would a response be processed by the same engine which sent the request? Messages are only sent between engines. Or do you mean something different by “actor”, where an “actor” is something like a “process” which can run multiple engines?

In a certain sense this is unavoidable, there will always be two types, one for the input and one for the output. If we represent behaviors as functions as you suggest later on, there are still two types, they’re just called “arguments” and “return type” (or whatever). There may be differences in how much “boilerplate” we incur (e.g. whether we can write a function type in Juvix or not), which I agree matters practically, but it’s not really a change in the fundamental number of types involved.

I think the specs need to make request-response semantics clear and low-boilerplate, we should definitely improve on this (in some way).

There are always 2 types, see comment above. I agree that there are practical ease-of-documentation advantages to representing behaviors and message types with Juvix functions.

What do you mean by this distinction? The specs sometimes use the word “reply”, but (afaik) it does not imply a distinct semantics. Are you referring to a distinction in Erlang/Elixir/BEAM?

Agreed, this sidebar is unwieldy.

What is the “engine-id” argument for here? If we just want to talk about what it means to process a “DeleteIdentity” message, I don’t see why we’d need an engine identifier.

I’m not following this part at all.

The data structure in the specs does disambiguate all the engine types, that’s why it has 2 levels, the first level splits all the messages by engine type. What affordance are you looking for here that isn’t offered by a simple sum type? Do you want more “type reification” of sub-engines or connected engines (like the workers related to an executor)? I’m not really following here.

I don’t think the operational semantics of in what order messages are processed should have anything to do with the type of delete (or definitions of engine behaviors more generally). That feels to me like mixing concerns together which should be separated. If engines require a certain ordering of messages, they should simply wait for the messages they need before continuing. Introducing unnecessary constraints on execution semantics, especially into definitions which are supposed to be purely declarative descriptions of behavior (not statements about what order to process messages in), seems counter-productive to your goal of simplifying the specs.

From my first reading of this post, I draw out the following two critiques which make sense to me:

  1. The message and engine type representation in the specs is somewhat unclear and somewhat unwieldy, and we probably improve it, perhaps in a way which would allow engine types and behaviors to be written as simple Juvix functions, which would reduce boilerplate.
  2. The intended execution semantics / lifecycle (or constraints thereof) are not clear, and we need to (a) define this more clearly and (b) agree on the boundary / what choices here we want to be made in the specs vs. the implementation.

I might be able to draw out more after clarifying some of the above points.

After talking to @mariari my understanding is that engineering has three basic requests:

  1. Clearly specify the execution semantics (lifecycle) of how engines are actually run and what order stuff happens in. This matters both for the specs being clear to implementors and for future formal proofs.
  2. See if we can write request-response pairs (at least) as simple Juvix functions to avoid boilerplate and potential mistakes, and to make it easy to use function documentation tools.
  3. Come up with a syntax and logic (potentially an environment monad) to support synchronous calls in engine behavior definitions and specify this clearly.
1 Like

This is on our backlog: we need to make the implicit explicit. This includes a distinction between, engine instances, and the processes that they “drive” / are the “motor” of:

  • Engine instances are terms of an engine type, consisting of
    • an environment
    • a configuration, and
    • a behaviour—very much as in the sense of the actor model or the original work of Moore
  • Engine processes (roughly corresponding to Elixir genserver processes) as created, run and terminated by ᴛʜᴇ engine runtime (using terminology inspired by dapr).
1 Like

Indeed, we have plans to extend the current engine format with I/O that is performed as part of an isolated turn (in the sense of the actor model). This is inspired by work on interaction trees (and the same concept can be used to reason about information flow control[1]).


  1. See, e.g., Hirsch’s work on Information-Flow Control
    . Compositional Security Definitions for Higher-Order Where Declassification
    Jan Menz, Andrew K. Hirsch, and Deepak Garg
    OOPSLA 2023
    . Semantics for Noninterference with Interaction Trees
    Lucas Silver, Paul He, Ethan Cecchetti, Andrew K. Hirsch, and Steve Zdancewic
    ECOOP 2023 (Distinguished Paper Award)
    . Giving Semantics to Program-Counter Labels via Secure Effects
    Andrew K. Hirsch and Ethan Cecchetti
    In POPL 2021
    . First-Order Logic for Flow-Limited Authorization
    Andrew K. Hirsch, Pedro H. Azevedo de Amorim, Ethan Cecchetti, Ross Tate, and Owen Arden
    In CSF 2020 ↩︎

1 Like

This would also be beneficial for easing the writing[1] of specs. We have been brainstorming more generally about slicker ways of getting engine behaviours written (@jonathan @tg-x). As should be a structural simplification, it may also avoid the “baroque” appearance of the documentation.


  1. While this is not a goal in itself, it is a means to an end. ↩︎

1 Like