I think the most underdetermined remaining part of application interface standardization is figuring out how applications interact with service commitments (promises to store data, perform computation, perform ordering, etc.). In particular, I would expect that applications rely on service commitments in the following ways:
In order to maintain state over time, applications rely on storage services.
In order to access state efficiently, applications rely on availability services (serving data over the P2P layer), which will likely be tightly coupled with some compute services to index over that data and serve computed results.
In order to compute longer-term aggregate statistics, applications will rely on perhaps a separate subclass of compute services.
In order to provide counterparty discovery for user intents, applications will rely on compute and bandwidth services (i.e. solvers).
In order to order user transactions, applications will rely on ordering services.
I think we should be able to clearly eludicate what services a specific application requires by:
describing the state of the application (which will be sharded over the network) and who is expected to store each part at each point in logical application time
describing the services the application wishes to provide to users, which will entail availability, compute, networking, and ordering services
We will need to figure out which parts of these definitions are parts of what exact data structures - the same application (in the resource machine sense) can be coupled with different sets of promises chosen by different users. Perhaps it would be helpful to introduce a notion of an “application service configuration” or something like this. With a sufficiently clear definition of a service configuration, we should be able to analyze whether or not an application can “keep promises” to its users - of a more gestalt form - on the basis of whether the service providers keep their constituent component promises which are part of the application service configuration.
You mention applications relying on storage, d/a,compute, bandwidth, and ordering services. What is the difference between the services? Are they expected to have different properties (if so, what properties are expected from each kind?) or can they just be described as compute + reduced-functionality-compute services?
Does the application decide who provides all of the services above or only some of them? I see it is at least responsible for choosing who works with its state
How do networking services (that the application can choose to provide) relate to the other services from the list? Asking because we don’t expect the application to rely on them but to provide them to the users
What is the relationship between the services the application provides for the users and the services the application relies on? Is the application in some sense an intermediary between the service providers and the users, so by comparing what the application promises to the users and what services it relies on we can tell if it can actually deliver what it promises?
At the highest level of abstraction, all of these services can be described as promises about what messages to send or not send in response to other messages over time, so in that sense they can be unified, but there are important sub-categories, I would say maybe four:
storage services concern (a) storing a particular piece of data for a certain time period, and (b) responding promptly to requests to retrieve that data (so-called “data availability”)
compute services concern performing computation and possibly providing proofs-of-correctness upon request
ordering services concern ordering transactions on request and maintaining safety (not double-signing)
bandwidth services concern sending network messages between various nodes upon request
These services will often be colocated (i.e. performed by the same physical node) for efficiency reasons, but they are conceptually and analytically distinct.
I think we probably want to reify services as first-class objects within the resource machine, i.e. represent promises to provide particular services and requests to obtain particular services as resources, in which case yes, applications can reason about the promises concerning their state. That said, there should still be a distinction between the application definitions (resource logics, transaction functions, projection functions) and choices of who should provide what services, which should always be up to the user. Maybe we will want some notion of an ApplicationServiceConfiguration which can often be packaged along with the other parts of an application definition, or something like that. I also expect that applications will often be configuring services on the basis of who has permissions in the application itself - e.g. in multichat, users with publishing rights to a channel should perhaps also be responsible for storing messages, or choosing themselves another party to do so.
I’m not sure I entirely follow this line of thought, can you expand a bit? Do you have an example application and example networking services in mind?
An application is not an intermediary in the network sense - it’s not a physical node - and as such it does not itself technically issue promises or provide services. I guess maybe a better way to phrase what I mean is that an application should often come with definitions as to what users should be able to do in interacting with it, which will require various services, and users should be able to reason about - given known promises to provide services - whether the application can indeed provide for those kinds of interactions or not.
I’m asking because you didn’t mention networking services when listing what services applications rely on to function but mentioned them saying what applications provide for the users. I’m not really sure what these networking services are really and don’t have any specific example in mind
Some questions related to the ApplicationServiceConfiguration:
We store content-addressed BLOBs in the Anoma key-value storage, so the data is immutable.
How can we have dynamic data, e.g., a list of service providers changing over time?
Is this supposed to happen via the pub-sub system, i.e., could this configuration file refer to a topic ID or be a topic itself?
Service providers could potentially be discovered over the pub/sub system, yes, and one could most likely use patterns very similar to the intent gossip / counterparty discovery system - I think we can just express service commitments and requests in the resource machine as an application, which would make that easy to do.
I think perhaps a related question is the question of nameservices, i.e. how would users or applications be able to associate names of semantic import with particular content-addressed pieces of data such as blobs in storage or external identities. Here, there are at least two kinds of nameservices which might be useful:
Totally-ordered nameservices, not entirely unlike something such as ENS, which would be applications in the resource machine, where, say, each controller can maintain their own namespace. Totally-ordered nameservices would make sense for e.g. tracking new versions of a particular application frontend code.
Eventually-consistent nameservices, which would just be associations gossiped around the P2P network, and stored in a distributed fashion. See this subsection of the identity specs for some relevant descriptions, but we still need to fully specify this. Eventually-consistent nameservices would make sense for e.g. trying to reach another node associated with some name (where totally-ordered consistency is not required), trying to look up validator quorums, etc.
Does that help answer your question? I can take a further look at IPLD/IPNS as well soon.
Yes, at least partially. Maybe it helps if I provide more context.
Concretely, I was thinking how an application could be loaded into the local storage/memory of the node when used for the first time (or after a longer period of inactivity). In this context, I imagined that there is an application configuration file pointing to all the data blobs being required and known already for this specific application (version). This could include, e.g.,
frontend code
transaction, projection, and logic function objects
non-linear/constant resources (?)
as well as a list of providers that are expected to store these (and provide other services) as you wrote above:
I think we should be able to clearly eludicate what services a specific application requires by:
describing the state of the application (which will be sharded over the network) and who is expected to store each part at each point in logical application time
describing the services the application wishes to provide to users, which will entail availability, compute, networking, and ordering services
An example config file could look like this
// Application Configuration
{
"frontend": "Bytes32", // Blob CID
"interface": {
"transactionFunctions": "Map String Bytes32", // Name + Blob CID
"projectionFunctions": "Map String Bytes32"
},
"resources": {
"logicFunctions": "List Bytes32", // Blob CIDs
"nonLinear": "List Bytes32"
},
"services": {
"storage": {}, // TODO: Unclear as it depends on how services are defined
"compute": {},
"ordering": {},
"bandwidth": {}
}
}
Some entries could change over time:
the list of providers is changing over time
a newer frontend or updated transaction function becomes (basically a new version)
With a name service, finding the latest application configuration file is straightforward.
However, this also could raise security concerns, e.g., if the application frontend or transaction function is switched out and suddenly behaves differently/maliciously.
If things are heavily composed, that could be more dangerous.
In my previous work, where we’ve built an on-chain setup process with a pre-approval mechanism before an installation, update, or uninstallation.
Is there an easier way to deal with this? I am realizing that we enter the complex topic of application versioning, upgradeability, and curation here.
I think we can just express service commitments and requests in the resource machine as an application
Oh, this sounds great!
I am now wondering how this could look like.
Let’s say my intent is to store a 1 GB blob for 1 year.
I call a storeBlob transaction function and provide
– my intent (“I give at most 10 Kudos. I want a StorageCertificate for my Blob that’s valid for 1 year.)”
– a reference to my blob (e.g., my own node ID and the key to the blob in my local key-value store)
A service provider matches my intent and it gets settled. The provider receives my Kudos and I receive a StorageCertificate resource referencing my blob address
Within one year I can get availability proofs from a projection function. If my blob is unavailable, I can call a transaction function that checks the availability again and allows me to claim a refund + compensation for the StorageCertificate.
After 1 year, the StorageCertificate expires and anyone can burn it.
Does this sound right? I find it harder to imagine how this would work for bandwidth, compute and ordering services.
I see, makes sense (in general). It’s a good intuition. I think we can split this into (at least) three distinct concerns, which I will call the service commitment cache, state-dependent provider lookup, and application frontend versioning:
Service commitment cache
I think it’s worth noting that many data blobs may be shared between different applications, so at least some of this data (e.g. who was last known to be storing a particular blob, who made a service commitment to store a particular blob, …) would make sense to keep in a local cache shared across all applications. Many service commitments will also be made in a way which is not strictly tied to a particular application - for example:
storage service commitments will be tied to particular data blobs (content-referenced by hash)
compute service commitments will probably be tied to classes of intents (so related to applications, but not necessarily a 1-1 correspondence)
ordering service commitments may be tied to assets used for fee payment and/or identities of the parties transacting or proofs which the parties are required to make in order to use a particular ordering service
packet relay service commitments may be similarly tied to assets used for fee payment, identities of the parties transacting, or other historical data
So rather - it seems to me - it might make sense to keep a “service commitment cache” which stores information about known service commitments - and use that cache in conjunction with the application definition and known references (e.g. data blobs) to look up service providers.
State-dependent provider lookup
That doesn’t cover all of what I think you’re describing here, though - in particular, you mention that service commitments can change over time, which is true. Another part of what I meant by “who is expected to store each part at each point in logical application time” is that these expectations may be state-dependent. Concretely, for example:
in multichat (a version without a designated storage provider), the sender of a message to a channel may be expected to store it for at least 7 days, and the receiver for at least 1 day
… so, if I want to fetch the message data, as a party syncing my local state, I should contact the sender and/or receiver (depending on time elapsed since the message)
In other words, often we will not be able to say which service providers are even relevant until we are talking about a particular piece of state about which we already know some information (in this case, the channel metadata and the message timestamp) - and - if I don’t already know this metadata - I will need to first fetch it before fetching the message data. This implies:
We will need a recursive synchronization procedure which - each run - fetches data that we know the providers for - and runs until all data is fetched.
For efficiency reasons (avoiding too many round-trips), we will probably want to come up with a way to represent and forward these complex queries as functions (so that e.g. a party who might have all the data already can serve the query using only their local cache and return all the results to you as a bundle). Luckily, this should be more or less the same problem as representing the application read interface that we’ve already been discussing.
Application frontend versioning
First, just to avoid confusion: I think it may be easier not to think of applications and frontends as abstractions which are particularly coupled. An application in Anoma is a set of interface definitions - which can be composed and decomposed - while a frontend (for now, before we have some sort of composable UI library) is a specific piece of data that may allow the user to interact with parts of one application or parts of many applications. The questions of upgrading, versioning, etc. for the frontend and for the application are very different - in particular, because the application state must be handled in the latter case. I think application state versioning deserves a separate treatment, so let’s just talk about frontend versioning for now here. I think this is quite related to the nameservice question I discussed in my post - a simple implementation could say that:
a particular frontend name is reserved by a particular party, represented by a non-fungible token resource
whenever they want to change what code that name points to, they consume the old NFT resource and make a new NFT resource with the same name and the new code reference (hash)
a second party who wants to use this party’s name can look it up and download the frontend code - then perhaps perform other kinds of verification on the frontend code (e.g. typecheck it)
More complex versions could add attestations to new code versions from different parties which the user could then check according to their personal trust graph - but let’s start with something simple first .
Yes, although the part about checking the availability and claiming the refund + compensation hides a lot of distributed systems and game theoretic complexity that we’ll need to reason through - but in principle, one could imagine such refund (insurance) mechanisms existing, yes. The rest of the steps you describe match my understanding, except that as part of (1) or (2) you need to actually send the blob to the storage provider in question (somehow) - could be part of the transaction, or perhaps sent separately and just referenced.
I haven’t thought through everything in detail here (@Jamie and @nzarin are also thinking about these problems), but for example:
for bandwidth services, you could request that another node forwards the next 1 GB of packets you send them in the next month
for compute services, you could request that another node commits X hours of CPU-time on request for the next month
for ordering services, you could request that the consensus provider commits X units of gas on request for the next month
My thinking was, that because the storage provider has the reference to my blob from my intent, they can get it from my node, but maybe this is not how it works.
We definitely don’t want to put 1GB of data into the tx extra data and gossip it around until a counterparty is found.
Had a discussion with @Michael yesterday and we’re trying to come up with a coherent framework to integrate the ideas about service commitments from this thread with the application report.
After reading up on the discussion does the following make sense as a high level approach to tie the discussion together? Details to be worked out, where not already mentioned in this thread. @cwgoes@Michael@vveiln
Service Requirements in resource logic
This should describe the necessary data availability/storage, compute and bandwidth requirements, and potentially how caching, fallbacks or redundancy hierarchies for convenience and efficiency should be organized. This should be in the resource logic or at least tied to it.
Service Providers by reference
In addition to the service requirements, the logic or the service requirements should be tied to a set of references to service providers.
A provider could be referred to in different ways (non-exhaustive list):
Fixed identities, effectively tying an application to a set of service providers.
Identities represented by resources, which could be updated via another application, e.g. a governance vote.
By role (e.g. sender/receiver, as mentioned above) with the identities being state dependent and dynamic
Services provided per TX via solving.
These 4 settings probably differ enough to warrant specifying requirements in the resource logic, s.t. the granularity of required service guarantees could be fully utilized.
We can also think about starting with settings 1. and 2., for now to think through how it should work in general and then go for the higher complexity/dynamic cases 3. and 4.
Commitment checking
Depending on the services provided, sometimes we will be able to provide direct in-protocol proofs of adherence to commitment, and sometimes we will only have stochastic proofs or votes on adherence, e.g. via the slow game.
As mentioned above, the service requirements should specify in which ways non-adherence is to be handled, once it is detected. This could work similarly to the controller fallback mechanisms.
Its not super clear to me who would do the commitment checking, maybe the logic could require proofs for TX validity, or there could be rewards for non-adherence detection to incentivize analysis of the application state.
This sounds like “SQL + locality annotations” (or a simplified query language) could serve this function in the general case, where locality annotations should probably look like controller trees, and indices would be the known metadata in addition to content addresses.
Then
would craft queries for this backend, which can be recursively resolved, where some cells on some nodes would only contain pointers to other nodes, s.t. a fetch for a “pointer cell” would not return the content but a component for the next query.
This sounds to me like prior work on this should exist. Maybe @tg-x@nzarin or @isheff know something?
It seems like we want to develop some DSL for commitments, which maps to intents and is amenable to (local) solving.
It should be able to encode one sided, as well as two sided committments.
In the former case, a service provider would commit to provide a service or a user to consume a service given some conditions are met.
These could be formulated as intents, where after they are matched, the user and service provider can (choose to) enter a two sided provision/consumption commitment, given certain conditions are met.
For one sided service provision announcements/requests, we probably want to have non-committing intents as well, s.t. a user can e.g. request quotes from different service providers.
Dimensions of the conditions could be:
Identity of provider and recepient
service rates, per denomination
durations of commitment (single use, duration in respect to some consensus or wall clock oracle)
state dependent roles of provider and recepient
other state dependent conditions
Given a setup like this, users could choose service providers from their cache according to their preferences by local solving, or do service discovery via the solving network.
Applications could include service commitments in the logic, or place them in an updateteable resource, to be used as default, or even exclusive providers.
Thanks for the updates here, following up on a few points:
I don’t really follow. First, isn’t the “reference” used for a particular service provider - in the sense of the name which we use to refer to that service provider - always an external identity? I infer that you must mean something different by the word “reference”, and I think what you mean is more like “pointer”, i.e. a place where users using an application would go to look up the external identity of the provider which is supposed to be providing a service (and maybe also where providers would look to figure out which services they are supposed to be performing). Is that right? If so, I think we need a word other than “reference”. If you mean something else, then I’m totally lost
In terms of the list you provide here, I don’t understand the essential distinctions you’re going for. It seems to me that there is an essential distinction between provider identities which are fixed in resource logics (I read your point 1. as implying this) - and thereby tied to a particular application as part of what defines that application itself - versus provider identities which are not fixed in resource logics (and are instead fixed in storage, somewhere, which can then be more or less dynamic depending on whatever update rules govern how that storage can be updated). This latter case seems to cover your points 2 - 4 - did you intend to imply some sub-distinction between 2, 3, and 4? What sub-distinction?
It also seems to me that service provider identities must be either fixed in resource logics or fixed in (or calculated from, but this difference doesn’t matter) storage, so I don’t understand why this list would be non-exhaustive either. What other cases did you have in mind?
See this discussion, which covers formal definitions of what service commitments are and how to check whether they have been adhered to.
I agree, but I’m also not entirely sure that we’ll be able to find something better than the basic approach of compiling functions to circuits, such that the nodes in circuits can be separately cached data dependencies, or something like this. Maybe there is prior work along this line?
Yeah. The service commitment discussion linked above can help with parts of this, though I think that perhaps in parallel we should just build a few simple service commitments as applications, to play around with the concept - even if we don’t have the measurement part implemented yet, making the service commitments explicit is useful. wdyt @Michael ?
Yes, pointer would be correct. We can use that, if its not too overloaded already.
would be represented by resources
would be dependent on metadata of messages
would be dependent on TX contents of messages
If you think, we can collapse all of these into “storage dependent” as an abstract category, that makes sense, but it feels like the semantics might be subtly different, at least on the implementation level. For example instances of the classes would have the following properties:
storage is handled by implicit requirements from the resource machine. updating it, as well.
depends on some message history, which should be available to participants and should be made available to other users of the application
would always be available to parties who need to do anything with the TX, without retrieval being necessary, and it would probably be the most pertinent.
and 4. seem like we can just collapse them into “storage dependent” + some implementation details, but I think if we do that to 2. we might lose expressiveness we get from it actually being state dependent on the level of the resource machine.
or do you mean that anything resulting from cases of 3. and 4. that is relevant to the application, must also be encoded at the RM level?
Thank you, this is helpful! Can we move this thread to a public category?
If we want to only have storage pointers in the circuit nodes, we can make the resolution of storage hierarchies independent of circuit evaluation and plug in the data after retrieval is finished. Something simple should be sufficient for now, but with this setup we should be able to evolve it as needed.
Agreed, we don’t need to fully plan out the DSL and I assume that at least some of the progress on the information flow control side should be resuable here.
What do you mean by metadata? Which field is this metadata, in an intent / partial transaction? This sounds like something different than “with the identities being state dependent and dynamic” (from your original point 3.), at least to me.
Maybe I don’t understand exactly what you mean here. By “services provided per TX via solving”, do you mean that a service commitment is entered into for the duration of solving that does not persist beyond it, that services are provided (but no explicit service commitment is made), and/or that services are provided in line with prior service commitments (or something else entirely)?
Yes, I think we’re on roughly the same page. To do this properly, we need content-addressed functions, which hearkens back to the discussions here.
Sorry, I meant identities in the sense you just quoted.
I mean that a solver, in a first solving step, might find commitments or committment options that the intent issuer might need to agree to for settling or even further solving of a TX.
The solver could help discover prior committments, or show a service commitment request from the TX to potential service providers.
In a setting like this, we should be able to cover all three cases you just mentioned, if we are fine with multi round protocols, as well as requesting and entering into agreements that persist beyond this TX.
Thanks. I think I’m convinced that there are distinctions which matter here, but I’m not yet sure exactly which ones. Let’s discuss this on Wednesday (feel free to continue noting thoughts here prior).