Physical nodes vs virtual nodes

I propose introducing a distinction between physical nodes and virtual nodes which I think will perhaps help clarify various discussions around nodes, identities, engines, and naming. Concepts are as follows:

A physical node is a particular physical computer run within and by an agent. Specifically, we can say that the agent dedicates a certain amount of storage, compute, and bandwidth resources to the node (these can change over time), configures it with their preferences, and the node runs autonomously using those resources, periodically soliciting or responding to inputs from the agent. Physical nodes are addressable only to other physical nodes who are physically connected to them, who address them by physical connection identifiers (such as an IP address, Bluetooth ID, etc. - but note that we have a weaker naming requirement than IP, we do not require that these names have meaningful semantics to any other physical nodes beyond the one to which they were given). In general, parties external to a particular physical node have no way to verify whether two distinct names correspond to the “same” physical node or not.

A virtual node is a network subgraph of physical nodes which collectively know a particular internal identity, in the sense that they can produce a valid signature for the corresponding external identity attesting to the association with the physical nodes included in the subgraph. How exactly they produce that signature may vary - it could be from many physical nodes simply keeping copies of the internal identity, or from a more sophisticated kind of threshold cryptography protocol. Note that a single physical node who generates a keypair is also a virtual node (just a subgraph of 1). Virtual nodes are addressable by their external identity, which is a global identifier.

cc @isheff @tg-x @Moonchild @nzarin for feedback here

I think a virtual node/logical node/node is importantly the only meaningful kind of node. Consider that we all have multicore CPUs these days; do we even know which parts of anything are executing on which cores? We can find out, but we don’t because we don’t care.

We similarly don’t care if a sysadmin decides to separate those parts by a length of Ethernet cable, right? It seems, I think wrongly, like something we might care about because the cable is easier to clip, but my fancy CPU is also half-burned out thanks to Intel’s shoddy microcode, which is approximately similar.

1 Like

In general I agree, but I think we still want the conceptual distinction, if only to make it clear which one we mean. Also, I still think a concept of a physical node is necessary to reason about a domain of control, right? We don’t care about the difference between Ethernet cable and an on-die interconnect, but we care about the difference between two sysadmins.

To me this distinction between a physical and a logical node makes sense and is in line with the way we think about nodes and connections in P2P land. We usually refer to identifiers in the physical network as addresses and to identifiers of virtual nodes as node IDs.

However, I am a little bit confused by what you (@cwgoes) mean by control here

The point of an overlay with virtual nodes is to have additional control. For example, by assigning virtual nodes identifiers from any desired (global) identifier space, we can implement custom routing and dissemination protocols tailored to our specific needs. In contrast, we have limited control over how packets are routed and transmitted over the physical network.

1 Like

Ah, here by “control” I meant “physical control” - in the sense that in order to have confidence about what a particular physical computer will do, I need to have physical control over it. If I give my phone to you, you may reprogram it to do arbitrary things, regardless of what I programmed it to do originally.

1 Like

Thanks for the clarification. If we want to make a distinction between physical and virtual nodes, we have to be very explicit and elaborate what these two concepts mean and how they are different in the specs. We cannot rely on the reader to go and search for good motivation and explanation as that it is likely buried in literature from the 2000s.

1 Like