[this post is still WIP, posting to ensure the WIP content is saved on the cloud â˘]
Post preface
Letâs try to extend this âas-if invisible handâ concept to more areas, at a high level. So far, weâve covered pub/sub. In this post, I will try to sketch:
- network relay,
- service tracking,
- storage,
- indexing,
- and ordering.
Network relay
Imaginary algorithm
The imaginary relay node would implement the following algorithm:
- Connect to all nodes on any physical connections they have available.
- Periodically measure these connections for latency and bandwidth capacity.
- When a relay request is received, relay the message to the connected node directly using the most suitable physical connection (given bandwidth/latency preferences and measurements).
Distributed state correspondence
State distributed around the network should correspond to:
- the state kept by this node (physical connections, latency, bandwidth)
- metadata about which other nodes have which connections (or, more generally, are likely to be able to reach which recipients), to be used in forwarding messages when a node does not itself have a connection to the desired message recipient
Similar to pub/sub, we should be able to phrase this distributed algorithm in terms of obligation-passing, where a node forwards the relay obligation (marking off routes it has already tried) until the recipient is reached.
Service tracking
Imaginary algorithm
The imaginary service tracking node would implement the following algorithm:
- When a new service commitment is created, store it.
- Measure all network activity.
- When a service information query is received, return any matching service commitments and performance data (the query can filter by commitments and performance data). Performance data is calculated on the fly as necessary.
Distributed state correspondence
State distributed around the network should correspond to:
- the state kept by this node (observed network activity, known service commitments)
- metadata about which other nodes:
- may have observed which network activity
- may have computed and recorded which intermediate performance data
- may be aware of which service commitments
â which is, in general, to be used in order to fulfill requests to query service commitments and performance data as close as possible to how the imaginary service tracking node would have done, given that the state is distributed.
An analogous obligation-passing model can be constructed.
Distributed storage
Imaginary algorithm
The imaginary storage node would implement the following algorithm:
- When a data blob is provided in a storage put request, store it, recording the specified expiry date in an index which is efficient to scan in order.
- When a data blob is requested by a storage get request, look up the hash in the database, and return the blob (if present, else nothing).
- Periodically scan through the expiry date index and delete any blobs which have expired.
Distributed state correspondence
State distributed around the network should correspond to:
- the state kept by this node (the actual data blobs)
- metadata about which other nodes are storing which data blobs and for how long, to be used in order to fulfill requests to query blobs as close as possible to how the imaginary storage node would have done, given that the state is distributed
An analogous obligation-passing model can be constructed.
In general, this question of metadata tracking - âwhich nodes know whatâ - is one where we may want to investigate more efficient organizational schemes, in particular those which reduce storage costs from linear (node X knows data with hash Y) to sublinear, which must necessarily organize the data in some further fashion, perhaps abstractly with predicates (node X knows all data which satisfies predicate P), and/or probabilistically (node X is likely to know data which satisfies predicate P). We should investigate this further.
Distributed indexing
We assume here that indexes can range over (project arbitrary computations over):
- Resources (current & historical)
- Transaction candidates & transactions
- Data referenced therein (stored in the distributed blob storage)
The imaginary indexing node would implement the following algorithm:
- Subscribe to all processing results of all controllers, such that the indexing node can detect whenever a new block is produced.
- When an index subscription request is received, store the projection function provided therein associated with the identity of the node who made the request.
- When an index unsubscription request is received, delete the previously stored projection function and node identity.
- When a new block is produced by any controller, iterate over the stored projection functions, recompute their results, calculate deltas vs previous results (which must be stored), and notify the subscribed nodes about the deltas (queueing and aggregating unacknowledged delta notifications).
Thereby, any subscribing node can maintain a synchronized local copy of an arbitrary projection of state, where they receive only delta changes (which are even aggregated while theyâre offline).
Questions:
- Should pub/sub be used as a component here (updates for specific projection functions)?
[todo: state correspondence for distribution approximation]
Distributed ordering
This one is probably the trickiest to figure out how precisely to abstract in this matter - Iâm not 100% sure about the following, but itâs my best guess at the moment.
There are a few levels of granularity in play here. First, the most basic: a single imaginary ordering node which orders all transactions.
The imaginary ordering node would implement the following algorithm:
- Initialize a counter at 0.
- When a transaction request is received, associate it with the current counter value, increment the counter by one, and produce an attestation that associates the transaction with the counter value.
- Repeat forever.
Here, we assume that all resources are tracked by this imaginary ordering node, and a transaction can reference any resources it likes.
Now, letâs introduce the constraint of heterogeneous trust. To implement this under the constraint of heterogeneous trust, we:
- partition the resources, so that each resource is associated with exactly one controller, where users have trust constraints and preferences that refer to controllers
- limit each transaction to consuming resources referenced by a single controller (it can still create resources on any controller, we elide the details of state sync at this level of abstraction)
We still have an imaginary ordering node, but it now processes transactions subject to these restrictions. This imaginary ordering node can then be approximated by an actual distributed set of controllers.
[todo: state correspondence for distribution approximation]
Several points are elided in this high-level description that we might want to cover in some fashion:
Amortization
Ordering blocks (an ordered list of transactions) instead of ordering transactions directly is an amortization technique to increase throughput at the (slight) expense of latency, assuming that producing attestations carries a cost (as it does in distributed consensus). Modifying the model to fit this requires that the imaginary ordering node:
- Accept transaction requests and note their order.
- Periodically, produce a block containing the most recently received batch of transaction requests in the previously noted order.
(note: the batch of transactions received in a certain period could be ordered differently)
This amortization technique can be applied at any of these levels of abstraction.
Scheduling & load-balancing
In general, users will have resources and some preferences about which controllers they trust and what kind of rough distribution of their resources they wish to maintain across controllers. One could imagine an imaginary ordering node which takes a batch of transactions (referencing any resources) and figures out how to move resources between controllers (subject to trust constraints) in a way which would allow that batch of transactions to be processed. At some level, the actual system should act like this, but where individual users are making decisions about how to move their resources. I am unsure how helpful it is to model this dynamic here, but it might be worth exploring.
A final note: in order to maintain, or at least gradually converge to, their preferred resource-controller distribution, users should select which controllers they create resources on (and perhaps sometimes even split them, e.g. tokens). In the controller model, this is âfreeâ modulo amortized state synchronization.