Networking discussion

Notes on our discussion on networking and ids from yesterday. @cwgoes @isheff @tg-x @graphomath CC @nzarin

My original proposal

Let’s get rid of engine ids because:

  • I think an external id is a trust domain, but different engines within the same node are not in different trust domains—a single node is a trust domain. Only a node should have an external id, and engines should be identified some other way
    • The current way is operationally annoying, not just conceptually annoying, because we have to reason about the trust associated with the identities to which we send and from which we receive messages
  • Currently, we sometimes send to topics and sometimes to engines; let’s make a uniform ‘topic’ abstraction and say that every message is sent to a topic, so that way we don’t have to reason about multiple different types of destinations

Currently, a topic contains one identity, which identifies who is allowed to send to it. TG agreed with me at the last hacker house that a topic needs to additionally include a uuid, in order to enable having multiple different topics with the same set of allowed senders. (‘UUID’ means ‘universally unique identifier’; generally, a 128-bit or 256-bit cryptographically random number.) My proposal: let’s add one more field, which is an external id specifying allowed recipients, i.e.:

struct topic:
    allowed-senders: external-id
    allowed-receivers: external-id
    uuid: uuid

(Messages sent to the topic would, generally speaking, be encrypted so only the allowed-receivers can decrypt them.) Under this new system, an ‘engine’ would be identified by a topic where:

  • allowed-senders is the ‘all’ identity
  • allowed-receivers is the id of the node where the engine lives
  • the uuid would be some uuid specific to that engine (to distinguish it from other engines on the same node)

A pubsub topic would be identified by a topic where:

  • allowed-senders is some identity (we already have this for pubsub topics)
  • allowed-receivers is the ‘all’ identity
  • the uuid is some uuid specific to the pubsub topic

And we can of course imagine pubsub topics where allowed-receivers is somewhat restricted—this seems like a useful thing to be able to do.

Another somewhat convenient thing this enables (but does not enforce) is standardised uuids. Suppose we agree that f51f2ae7ff1816e4 is a standard uuid for mempools—i.e., somebody will decide on a specification for the behaviour of mempools, and generate the uuid f51f2ae7ff1816e4, with the convention that mempools will use the uuid f51f2ae7ff1816e4. Then, if I want to talk to a given node’s mempool, it suffices for me to know that it has a mempool, and then I can just send a message to topic with uuid f51f2ae7ff1816e4 at that node. Rather than having to learn, not only that it has a mempool, but also the specific id of its mempool.

Subsequent discussion

Went on for a long time—I will write everything I remember but please weigh in if you said something and I forgot. Or if you have new or different opinions or arguments, of course. Also not entirely chronological, but a mix of chronological, topological, and increasing complexity. Also prioritising saying things over making them consistent.

TG: engine message/topic message was supposed to be unicast/multicast and is operationally different.
Elijah: but unicast is a special case of multicast where there happens to just be one recipient.

Elijah: multicast involves complicated routing, but unicast also involves complicated routing, and we shouldn’t necessarily expect that the latter will always be uniformly simpler than the former (even if it tends to be in general). Further, we can have a special case in the routing logic for the case where a topic’s allowed recipients are just a single node (and implicitly assume that such a topic is always subscribed

TG: shows routing table (top right whiteboard) where a given id can map to a topic or an engine
Chris: in that case these are already being somewhat conflated so I lean somewhat towards agreeing that a unified abstraction makes sense
Elijah: (top right whiteboard topic|engine) that is—we currently have routing table entries which are a topic or an engine, so just let them always be a topic

TG: if a message dest. includes a node id then we do not need engine adverts which is another operational advantage

TG: but it’s kind of annoying that now that I cannot just send a message to an id (I don’t understand this complaint but I remember there was something like it)


More interesting further discussion

We tentatively agree that some degree of change and likely unification is requisite. But we’re discovering ontological problems that need to be resolved? What are ‘node’ and ‘id’? What concepts do we have (do we need more concepts than just those?), and what words should we use for them?

Node

We might want to have multiple different computers with a single logical thing spread over them (for ex. single logical mempool, mempool workers on different computers). Should each computer have a distinct ‘node id’ or no? Probably not. We can have an id node1 OR node2 OR node3 for contexts where we want to treat the group as a single trust domain. We want to have some notion of a ‘routing domain’, that is, something to which messages can be sent over the network (a single computer). We tentatively agreed that a node is a routing domain. (Next-day discussion—perhaps Isaac disagrees.)

Routing

Elijah: I think ‘id’ means ‘trust domain’.
Isaac: I think ‘id’ means ‘something to which you can send messages’. (Writes ‘an X is …’ on the bottom of the whiteboard, then says ‘I posit that an X is an id’)
Elijah: but a topic is not an id and we can send messages to it too
Isaac: a topic is an id (i.e. message target) plus some access controls
(I don’t understand why we can’t use a more uniform general mechanism where all message targets have associated access controls, and we sometimes have the degenerate case of no restrictions on access.)

(I didn’t say this at the time, but: if an id just denotes something to which I can send messages, then I want something that’s just a trust domain? Further discussion with Isaac today: maybe there shouldn’t be something that’s just a trust domain. Routing to compositional identities is confusing. Maybe some identities are message targets and some are something else that we don’t know what they are yet. Also maybe ‘node’≠’routing domain’—I forget why.)

Isaac: we can do your thing where engine has the same cryptographic key as the node it lives on while also saying you always send to an id. Add a special compositional id id-with-uuid(id, uuid). (Or go for: pubkey-with-uuid(ed25519-pubkey, uuid), or whatever.)
Elijah: I guess it works but it’s aesthetically displeasing and implications for signs-same-as/reads-same-as is weird. (I guess if ‘id’ means ‘message target’, then signs-same-as/reads-same-as is not equivalent to is-same-message-target.)

Elijah: a message should have a destination id and a delivery hint? (For example, send to the ‘all’ id and have a hint 'only to people who’ve said they want to subscribe to uuid blablabla.)
Isaac: meh. maybe

Chris: node is a routing domain
Elijah: what is that—how identified, concretely?
Chris: idk
Elijah: uuid? cryptographic key?
Chris: routing and trust domains shouldn’t be conflated—should be orthogonal. So could be uuid, sure
Elijah: is a routing domain uniquely associated with at least one trust domain (and possibly non-uniquely associated with more trust domains)?
Chris: shouldn’t necessarily because orthogonal
Elijah: but a routing domain may have trust associated with it—connection is to a routing domain and has associated authentication (tg agrees—isaac still wants to only send to ids)
Chris: maybe somewhat convinced

We want some notion of a ‘user’. Somebody has 5 devices. Something differentiates them but there should be something they have in common. (Is a ‘device’ or ‘node’ always associated with exactly one user?)


Metacomments: i think it’s somewhat clear what the ontological problems are but not entirely clear the solution. This forum post is not very clear.

1 Like

Cryptographic IDs

We should eliminate signatures on each sent/recvd message for efficiency reasons as public key operations on each message are costly and unnecessary.
It’s sufficient to rely on forward-secure transports and mutual authentication of node IDs after establishing a transport channel.
Since all engines of a node belong to the same trust zone, additional cryptographic engine IDs and signatures are not necessary after establishing a mutually authenticated transport channel.

Requirements

Engine IDs should be:

  • provably globally unique
  • large enough ID space for all engines on all nodes
  • not easily guessable to avoid spam/DoS

Possible options instead of cryptographic IDs:

UUIDs

As defined in UUID, these only offer probabilistic uniqueness, and there’s no way to enforce uniqueness across nodes, thus unsuitable for routing purposes

Hash-based IDs

We could use engine IDs derived from the cryptographic hash of the node ID and engine instance name, optionally using a keyed hash (e.g. BLAKE3 allows this):
H(key, node_id || engine_name)

However, the only way to prove that the hash is derived from the node_id engine_name is to transmit them separately.
Thus we arrive at the next option.

Tuples

The simplest and most efficient option is to use the following tuple as engine IDs:
(node_id, engine_name)

This offers several advantages:

  • simpler routing
    • no need for EngineAdvert messages
    • no need for routing table entries for engines on remote nodes
    • routing incoming messages based on locally known engine names
    • routing outgoing messages based on remote node IDs

Uses

This way we can have globally unique engine IDs, which are more flexible than before,
it also allows well-known / public engine IDs where the engine name could be e.g. “transport” or “router” etc., and private ones that are a (keyed) hash, or can also be based on cryptographic IDs.

We can also have multiple engine IDs point to to the same engine instance,
e.g. one based on some crypto key if needed, another based on a name.

This allows more flexibility in that certain engines can have crypto IDs but not necessarily all.
For instance pub/sub topics and domain engines could still have crypto IDs,
and such crypto IDs could also be a topic or domain ID instead of a publisher/subscriber or member ID, i.e. no need to possess the crypto key in order to route a message to the engine that can handle it.

1 Like

couple of possibilities that came up in the meeting this morning:

  • have a per-node secret. use low-entropy engine names internally, but derive high-entropy external names as H(node-secret || internal-name). (low-entropy name as something more akin to a pid, though unfortunately can’t be a literal pid because we can’t serialise them)
  • use structured, hierarchical engine names, e.g. (a, i) as the name for the ith engine spawned by engine a