Specs vs. Code: the Q3 round

ray · September 3, 2024, 8:46pm

We’ve gone over specs-code differences in preparation for our discussions next week, here’s the notes:

Proper Scry and Deletion

Deletion is an overloaded word; our storage is architected to be very much append-only in semantics, and I’ve posted before about the difference between “delete” (just a write) and “expunge” (a rare and destructive operation, you do this if you e.g. leaked your valuable private key into state, or choked it with a 100TB blob, or something)

(Anoma Node: Proper Scry support)

Notes about the namespace buzzword

Transport Engine and Supervision Around it

Transport Engine Design

The transport design currently has some differences; it’s poor code design for a message sender to say “I am now sending a message to the Transport Engine,” since this means it has to think about the local/remote distinction in ways which are well outside of its scope. Transport, in the above design, lives on the message-reception side, handling getting messages to their destination.

Pub Sub System

There’s implicitly three sorts of message possible:

one which is fire-and-forget: you send it. if you do happen to get a “reply” causally downstream of this, it’s in the form of an entirely new message. This is GenServer cast, and is the default sort of message.
one with synchronization requirements on the sender: when you send such a message you stop and wait for a reply to it. This is GenServer call.
one which isn’t sent to a destination, but broadcasted to anyone interested in it. This is the “pubsub” pattern, implemented as EventBroker (see my design at Anoma Topics Meeting Digest: Global Data Brokers)

Replay System

There are two kinds of state changes in Anoma Node:

User-input states, i.e. states of form change_state_X_to_Y which puts :X field to have value Y in the actor. Example: set_timer(X) sets a timer for the pinger execution to X. (or smth like that)
Anoma-generated states, i.e. all other states. Semantically, all states which are deterministically changed via user-input yet which is not directly identical to the input. In other words, the state-changes are “caused” by elixir functions producing some computations using the user-specified input or some input which is provided via a causal chain from the user input. Example: user submits transaction code using Mempool.tx. The resulting state changes are provided by “internal computations” (if you accept the separation between user and system).

Problem: snapshotting. Getting state from all the actors synchronously via state-dumps can cause synchronization failures.

Solution:

a. Forget about state information of type 1, i.e. user specified. The user should know themselves what they want to set it to. They either have the info in a config file (maybe best to have all such info in config) or set them directly
b. For all state changes achieved via 2, we replay them from a specified checkpoint.

What we mean by checkpoint: can vary, a good time to do this is block commit.

What we mean by replay: we have a limited number of user-inputs of type 2 which affect state. Specifically, Mempool.tx and Mempool.execute. If we save in a linear fashion which transactions were submitted when (with which ID’s and with which orders) and when execute commands were sent, we can ask the system to just recompute all the submitted transactions to bring us to a latest checkpointed consistent state.

This isn’t particularly groundbreaking, log-and-checkpoint is the way every replicable database works, but unfortunately mistakes were made and “state dumping” was pursued for a while.

Philosphical Difference on Transparent RM

Being discussed actively in this thread:

Philosphical View on Checkpointing

On new block creation, the block itself is a good time for a checkpoint.

If a block is not created, we can roll back the events and come up with new ones that satisfy consensus.

It’s not precisely a rollback, consensus are also events, but it’s similar to one in that the state reset to comes from the last checkpoint.

It would be good if specs could discuss this, our latest scry changes:

includes a good point for actually making this work from the codebase.

In some threads we might overload “Block” to mean “checkpoint” where in specs it means “narwhal root node” or something; we should fix this language.

Hardware Abstraction Machine

Doesn’t seem to interact with the system at large and does things which can’t be made distributed; it’s unclear what this is for. V2 might make it clearer.

Measurement Engine

The following section is personal opinion inserted by Jeremy, I don’t really have a comment:

The ideas of trying to use meta information to influence decisions is nice, however I think it ought to be throught out from a holistic POV.

The talk above I give about trying to understand reflective design and examples like CLOS and smalltalk are a good starting point for doing something like this correctly.

Identities

This is discussed at https://github.com/anoma/anoma/issues/571

Anything else?

What do specs v2 drafters have in mind?

cwgoes · September 4, 2024, 1:03pm

Thank you for this excellent overview. Let’s go over these in detail next week, but briefly:

I think your model of deletion is correct - the specs should be updated to match it.
We need to specify scry semantics; the linked threads are a good start. The specs should also be updated to match implementation details (of which messages are sent to which engines etc.)
Everything transport and P2P-related needs to be synchronized with specs, @tg-x work, and engineering. We can do a session for specifically this next week.
Ditto for pub/sub. This one should also be synchronized with @graphomath @jonathan work on engines.
I think we need sessions on replay/checkpointing and transparent RM as well.
I don’t understand the point about the measurement engine. We can cut it from the specs if it’s not useful right now.

Topic		Replies	Views
Anoma Node: Proper Scry support Protocol Design architecture	0	75	July 17, 2024
Anoma Topics Meeting Digest: Global Data Brokers Protocol Design anoma , architecture	9	130	July 31, 2024
Features for a Future Anoma Protocol Design architecture	0	29	March 26, 2025
[Draft:RFC] Dreaming of Multichat: A chat application built with Anoma New Age Economics	7	746	July 17, 2024
Questions I want to see researched Protocol Design	18	173	April 2, 2025