Anoma Engines: An argument against sender categorization of messages

mariari · January 20, 2025, 7:49pm

Engines are multi mailbox actors. In particular the sender decides what mailbox of the receiver receives the message.


type EngineMsg M :=
  mkEngineMsg@{
    sender : EngineID;
    target : EngineID;
    mailbox : Option MailboxID;
    pattern : CommunicationPattern;
    kind : EngineMsgKind;
    msg : M;
  };

MailboxCluster (S M : Type) : Type := Map MailboxID (Mailbox S M);

type EngineEnv (S Msg : Type) :=
  mkEngineEnv@{
    state : S;
    mailbox : MailboxCluster S Msg;
    acq : AddressBook;
  };

type Engine (S E M C R : Type) :=
  mkEngine@{
    status : EngineStatus;
    cfg : EngineCfg C;
    state : EngineEnv S M;
    behavior : EngineBehaviour S E M C R;
  };

In the types above we can see that the EngineMsg contains mailbox which specifies which mailbox in the MailboxCluster it ought to go to. This I argue goes against the design princes of the event system within Anoma, but before we discuss how, let us discuss what the event system in Anoma is.

The even system within Anoma was designed to allow consumers of events to properly filter what they think priorities are. This means that the creator of the event does not have to predict the potential use cases of the users and preemptively categorize their events. This gives power to the consumers to write custom filters that work precisely for their use cases.

A good example can be found in the Elixir Implementation of the event broker:

  deffilter LoggingFilter do
    %EventBroker.Event{
      body: %Node.Event{body: %Anoma.Node.Logging.LoggingEvent{}}
    } ->
      true

    %EventBroker.Event{body: %Node.Event{body: %Mempool.TxEvent{}}} ->
      true

    %EventBroker.Event{body: %Node.Event{body: %Mempool.ConsensusEvent{}}} ->
      true

    %EventBroker.Event{body: %Node.Event{body: %Mempool.BlockEvent{}}} ->
      true

    _ ->
      false
  end

  def logging_filter() do
    %__MODULE__.LoggingFilter{}
  end

## Used like this

    EventBroker.subscribe_me([
      Node.Event.node_filter(node_id),
      logging_filter()
    ])

This creates a filter that filters through “logging” of the system namely TxEvents ConsensusEvents and BlockEvents. The subscribe_me filters for all messages for a particular node and this filter.

Now if we buy into the argument for the EventBroker design then by the same line of argumentation for mailbox clusters that the Engine should determine in what mailbox the message should be put inside. This is because this prevents other Engines from caring about the internal details that the particular engine cares about, and that the Engine who receives messages can classify the kinds of messages it cares most about. Further we can apply the same kind of deffilter in the Elixir code above to achieve the intended affect.

Therefore I argue that to unify the system design the receiving engine ought to decide what mailbox goes into what mailbox, further I will argue in a follow up post that we should open up to engines in a reflective way for them to determine how they read their own mail.

Thank you for reading

cwgoes · January 22, 2025, 10:06am

Thanks for the write-up. All else remaining the same it makes sense to me that how messages are processed should in general be up to the recipient, not the sender. If the sender wishes to in some way indicate priority that can be done in an opt-in fashion (where the receiver can choose how to use the sender’s recommended priority information).

I want to ask one clarifying question: you refer to an “event system within Anoma”. What exactly is this event system? Is this just a name for “all the messages sent between engines” (or actors)? Or is it something else (e.g. a subset of those messages), and if so, what (e.g. what delineates the subset)?

@graphomath Can you provide a refresher and/or links in this thread to the previous discussion and rationale behind multiple mailboxes and what purpose they are supposed to serve? I think we’d better refresh ourselves with that context and synthesize these perspectives in order to figure out how to move forwards here.

graphomath · January 22, 2025, 12:13pm

Previous discussion

The thread started with mention of event design, the rest is much less clear. I could try to summarize the rest, but would rather wait until after the write up of the event design, and hopefully this reluctance to answer will become clear at the end, but let us quickly recall the very short story of “Why multiple mailboxes?”.^[1]

On multiple mailboxes

The idea to have several mailboxes within each engine (think actor if you like) is inspired by two papers^[2]. Right now, having several mailboxes is not much more than a nudge to the specification writer to write engines that sort received messages that cannot be processed immediately into one of the mailboxes, mainly based on the actual message, but it may turn out that it is actually the mailbox that is identified by the mailbox ID that was kindly provided by the sender^[3] as part of the “address” (instead of just using the default mailbox ID 0); however, note, this may be the “wrong” mailbox ID—be it by accident, out of malice, or due to a bug in the protocol specification/implementation. The main intuition (that we may want to write up in clear terms) is that each mailbox state keeps track of the state of an ongoing, but interrupted “conversation”—after all engines can be in several “conversations”, and they may or even have to switch contexts; however, they can only process one message at a time to be in line with the bulk of actor model variations out there.

In summary, on multiple mailboxes, first we may want to write up in long form what several mailboxes can do—in principle, and in particular; second, I do have a take away already now: the quick hack of just using natural numbers for mailbox IDs should be fixed by pairing each mailbox ID with something human readable such that not only a machine but also a human could

easily.

Forward, after a short aside

As a short aside, the multi-mailbox paradigm has as a by-product a canonical separation of (part of) local engine state, which opens up the road for good ways^[4] to have parallel threads within the isolated turn of an engine: several threads could work on several different mailboxes in parallel as part of an isolated turn. But yes, we desperately need a write up of

the isolated turn principle for engine processes
what several mailboxes are good for (and if we should reconsider the terminology)
a comparison with ideas that are present in the event system of the current implementation.

I am happy to work out all three of them and always in want for suggestions for how we can make things simpler or clearer.

Here’s another copy of the link to the thread. ↩︎
Special Delivery: Programming with Mailbox Types, Selectors: Actors with Multiple Guarded Mailboxes. ↩︎
For the record, it was never intended that the “sender decides what mailbox of the receiver receives the message”, which is a conclusion that came as a big surprise to me (and it should not follow, but this is for another conversation/thread). ↩︎
In particular, we can stick with the very simple approach to have just one message/event/trigger at a time, by default. ↩︎

cwgoes · January 24, 2025, 8:41am

This sounds like it’s exactly what @mariari wants, which is receiver-determined prioritization.

But how would the sender know what/which mailbox IDs to use? What if different senders use the same mailbox ID? What if they use the same mailbox ID by accident with different intended semantics? This sure sounds like it’d require a whole lot of coordination between senders, which I doubt we really meant (or want) to require here. If we want senders to be able to provide priority information, it seems to me like the most we could get is priority information between different messages sent by the same sender, as a suggestion/request to the recipient, and the recipient will have to sort out any priority questions between different senders. I think – at least in this case – reasoning about this problem from first principles will be much faster than attempting to reconcile variations or contradictions between different formulations of the “actor model” in the research literature.

Now I am more lost. Are mailbox IDs intended not to be part of the engine specification, but some natural language guidelines to implementors? That would be a different discussion, but they certainly do appear to be part of the current engine specification. Or do you mean to suggest English names as some sort of guidelines to the implementor? But why would an implementor need to know this, if message prioritization is handled automatically by engines? I’m not following here.

graphomath · January 24, 2025, 10:08am

I am glad that it sounds that way. However, a proper comparison with the event system would be a nice thing to have, so that everybody can come to this conclusion w/o the need to read up on “everything”.

jonathan · January 24, 2025, 10:24pm

mariari:

Engines are multi mailbox actors. In particular the sender decides what mailbox of the receiver receives the message.
type EngineMsg M :=
  mkEngineMsg@{
    sender : EngineID;
    target : EngineID;
    mailbox : Option MailboxID;
    pattern : CommunicationPattern;
    kind : EngineMsgKind;
    msg : M;
  };

Look the definition again. The mailbox in an engine-message is of type Option MailboxID.
In this context, you send a message, and either you know the target’s mailbox to put the envelope (message), writing something like “mailbox := some 1223”, or you don’t and don’t care, then writing “mailbox := none”. I’d say, there is freedom.

My take on engine-mailboxes.

Having mailboxes leads to efficient communication. Just consider the following scenario,
and it may clear up why mailboxes is convenient and provide context-switching.

Say EngineA is waiting for an important message from EngineB on a Black Friday.
EngineA starts receiving so many offers in the mail that it’s almost impossible to know whether EngineB sent the mail or not.
With only one mailbox, there is no remedy but to filter all messages by brute force.
However, if EngineA has dedicated mailboxes, one for “important” messages and the other for “promotions”, finding EngineB’s message is simply efficient. Sure, as long as EngineB knows about EngineA’s mailboxes.
If EngineB is unaware of EngineA’s mailboxes, there is no improvement for Engine A. Even if EngineA has several mailboxes, EngineA needs to organise the mail by themselves to find EngineB’s message. But that’s another problem one can solve if the engine runs some filtering algorithm over their messages. See message’s selectors.

graphomath · January 27, 2025, 3:00pm

EDIT: warning: this post is not easy to read because several topics are mixed

priority queues for message inboxes of processes
- priorities managed by the runtime / scheduler: this amounts to “external” sorting of the message queues or the order in which an actor is fed the messages
- priorities managed by the process: this “internal sorting” amounts to receiver priorities, based on the current state of a process/actor
succinct references to previous conversations, which can be “abused” for either of the above, but whose primary purpose is to keep track of different “conversations” (useful for error detection / avoidance)

In short, the first kind of priorities are for improving performance, assuring liveness, etc.; references to previous message are first and foremost for error identification (and possibly compile time checks of protocol conformance).

Let me address this question in three bullets, namely

a simplest possible example (of how a sender would know),
an “interlude” on the design principle, and
a fairly general instantiation of how (and why) a sender can (or should) know a mailbox ID.

This is to be sure that I do not be too terse (for once).

in the simplest case, …

… the mailbox ID is like a reply-to address in e-mail communications: some engine process p expects an asynchronous reply from an engine process q and thus creates a new, opaque/unguessable mailbox ID ν, and sends it to q so that q then can let p know about the context of the “conversation” even before the actual message processing is started;

by design, …

… each mailbox ID that a process p will accept for future incoming messages must explicitly be shared with another process q before it can used as part of addressing information for messages;

concretely, but fairly general, …

… a mailbox ID ν will be created by an engine process p whenever it expects to

possibly receive a message at some point in the future by
some other engine process r (that may not even exist yet),
which should have a way to succinctly reference this “conversation”, and
thus will send the mailbox ID ν (as part of the payload of a message) to some engine q, such that
eventually r will come into existence and also get to know about the mailbox ID ν …

… and engine r will be able to chime into the ongoing conversation ν.

Then we should have an error.

Then we are out of protocol or have another error.

That is another good reason why it is optional. The extra coordination effort may pay off though if it helps error containment: stricter requirements for the allowed contexts of a message reduce the number of potential errors (at the receiver); in short, if the error is caught before we even start the message processing, we are earlier in error detection (and not accidentally coax the recipient into an undesirable or even invalid state).

Before I address this case, as a miniscule disclaimer: I am not 100% sure what

this problem

refers to (while conceding the possibility that there is something that could be reasoned about from first principles faster, and indeed, nobody should be forced to read through the rather bewildering multitude of variations of the actor model).^[1]

So, let me state now that I would like to argue that the (primary) purpose of providing a mailbox ID as part of a message is independent of “this case”, which you have described as

If we want senders to be able to provide priority information, it seems to me like the most we could get is priority information between different messages sent by the same sender, as a suggestion/request to the recipient, and the recipient will have to sort out any priority questions between different senders.

At least, as far as I know, there is no such thing planned in “specs department” like “additional priority levels” of sends from one engine process p to an engine process q on the general level of engines. In certain cases such a principle may be relevant for a specific pair of engines, but I have not come across such a pattern yet.

TL;DR : I think, priority levels—in either of the two variations of this case—is a different topic, …, or do we need to talk about priorities and “this problem” more? ^[2]

So, maybe leaving a topic for future discussion aside, I would like to propose a possible

conclusion

It may turn out that @mariari would agree that having Mailbox IDs as part of the addressing information of a message in the specs is a possible and valid design decision; the alternative is that we move them into the message payload (and accordingly the mailbox state and message queue into the engine-specific local state).

The claim is that the principle of referencing certain conversations pops up in all sorts of contexts. The question is whether or not we should “have to” spawn a new engine process for each such messaging context (which is operationally equivalent, but leads to separation of possibly related “ongoing conversations”)—or we have an optional alternative mechanism to separate out this information, using MailboxIDs (up to choice of a better name).

As a functional definition and summary of the whole story, I would propose

MᴀɪʟʙᴏxID = succinct reference to a messaging context.

I would expect this makes sense to have for every message sent in the system, except for new conversations. It can be implemented by a generic additional argument in the Erlang implementation. So, I would expect, the main challenge is to hide the ugly boilerplate code that we are suffering from in the current version in the specification and to make clear that it will also be beneficial to have a uniform datum for previous messages in communication patterns that we have started discussing elsewhere.

However, this all is to ask:

Is it now clearer what functions Mailbox IDs can have?
That priorities are not their primary purpose (and a proper handling of priorities is a different matter)?

[EDIT:] NB: Mailbox IDs can of course be used to encode priorities, but if we wanted some priority mechanism, then it should be added as a separate feature (and we could of course have both: succinct message references and priority levels).

Me not being 100% sure is mainly due to a specific interpretation of the term priority in the context of EEP 76: Priority Messages, which may be rather different than already existing out of order message processing, which is what we would want in any case (independent of design choices about MailboxIDs, as detailed below). This thread may put some some light on the differences between out of order message reception and priorities (in the sense of the EEP), but I think, that’s really a different discussion. ↩︎
If need be, maybe we can write a different thread on whether priorities are different from something that concerns Mailbox IDs, ironing out the point that the later are only to be considered as part of the interfaces between communicating components and should not abused for priorities, of the other kind. Mailbox IDs—whether we want to include them as part of specs language or not—are first and foremost intended as a means of structuring communication between communicating components. ↩︎

graphomath · January 27, 2025, 4:27pm

As described above, mailbox IDs are intended to reference a “conversation” that has already started. As mailbox IDs should be opaque/unguessable, they by themselves do not contain any meaningful information for error messages (at the receiving engine process) However, besides the mere fact that an error has been detected at the receiving engine, additional information for error identification and recovery from this error is desirable. So, at least in DEBUG mode, each message should come with information such that the recipient could inform the system at large which error has occurred.

cwgoes · January 30, 2025, 7:24am

This makes sense to me (as does the paper which we discussed on Slack). I think what I was mostly confused by was how this discussion was entangled with that of priority levels (perhaps from @mariari’s original post), which I think is really a different topic and subject to different desiderata and constraints – it seems per

and

that we are agreed on this point. Thank you for the exposition, I found it helpful!

mariari · February 4, 2025, 12:06pm

jonathan:

Say EngineA is waiting for an important message from EngineB on a Black Friday.

EngineA starts receiving so many offers in the mail that it’s almost impossible to know whether EngineB sent the mail or not.

With only one mailbox, there is no remedy but to filter all messages by brute force.

However, if EngineA has dedicated mailboxes, one for “important” messages and the other for “promotions”, finding EngineB’s message is simply efficient. Sure, as long as EngineB knows about EngineA’s mailboxes.

If EngineB is unaware of EngineA’s mailboxes, there is no improvement for Engine A. Even if EngineA has several mailboxes, EngineA needs to organise the mail by themselves to find EngineB’s message. But that’s another problem one can solve if the engine runs some filtering algorithm over their messages. See message’s selectors.

This assumes actors are kind and respect your preferences. Maybe you do business with that Engine for other reasons and so you don’t want to blacklist them, but they will on accident or purpose put low quality mail into your priority mailbox. This is why fundamentally the sender can not chose the mailbox, which is the issue with the sender selecting the mailbox to send it to.

None of my argument is predicated upon multiple mailboxes or not, I don’t wish to have that debate here as that would be an entirely different discussion, however the argument is against the sender categorizing mail since only you know what you care about.

Much like how in my emailbox I (the engine) chose what mail goes into what categorization, I have rules in protonmail that filter specifically for what I find to be an advert or priority, not some external organization putting that on me.

degregat · February 4, 2025, 3:04pm

I agree with this. The only adressing where the sender can control that no one else reads a message requires cryptographic operations where only the recepient possesses the secret to decrypt. Reliable delivery must still be ensured via having e.g. enough redundant routes and the ability to choose which ones to use.

In general, routing of messages will be done by parties that can make decisions and enforce them, i.e. agents that own nodes can configure the internal routing for engines as they wish. We can make recommendations on how to do that, but the behavioral commitments we specify should constrain the possible implementations enough s.t. they are isomorphic in respect to external behavior of a node.

If filtering turns out to be too inefficient, this internal router can still cache internal routes between engines, as well as remote routes where engine messages originate from engines on other nodes.

As far as I understand it should also not matter which specific engine out of a set of multiple instances, e.g. workers on a validator, is addressed with a specific message. To the contrary, distribution of traffic, including load balancing should always be up to the local node, since this might be used as a vector to provoke behaviors unwanted by the receiving agent.

Topic		Replies	Views
On simple model for the Anoma specification Specs v2	8	67	January 10, 2025
Proposal for revised engine groupings Protocol Design	16	343	November 21, 2023
On Identities: Or how I learned to stop worrying and love Anoma Specs v2	4	108	January 15, 2025
Anoma Topics Meeting Digest: Global Data Brokers Protocol Design anoma , architecture	9	133	July 31, 2024
Node architecture: machines, engine groups, engines Specs v2 architecture	3	96	October 22, 2024