RSM Storage Permission Applications Pt. 1

As a part of the series of discussions about synchronising RM and RSM storage, I attempt to describe how writes to RSM storage could be governed and how shielded this schema can be. What I want to describe:

  • storage permission application (SPA) design
  • how SPA can be used by other applications to write to RSM storage
  • how shielded SPA mechanism can be (both SPA resources and guarantees for other application that use SPA)

This part only covers the first part and hints at the second part.

Assumptions/wants:

  • writes/deletes are governed by resource logics

  • these logics specify the path to the blobs they govern (implies no function privacy for such logics)

  • applications (both shielded and transparent) want to write to the blob storage and have some sort of access control

  • blobs to be written are included in the transaction and are validated by the responsible logic

These assumptions are treated like axioms - we don’t question how true they are, just base the proposal assuming that they correctly represent what we want.

The schema

Storage partitioning

Let’s assume that resource logics governing the writes (from now on: Storage Permission Logic - SPL) partition the blob storage with the criterion of the conditions on which anything can be written/deleted to that part of the storage. We assume these conditions are unique. We do it for simplicity of representation, it doesn’t have to be like that in practice.

So it would look (conceptually) like something like this:

Each SPL corresponds to a certain application. Let’s call any application of such type Storage Permission Application – SPA. Only SPA applications can govern RSM storage writes.

How SPA could work

There are multiple possible ways to arrange SPAs, I suggest two versions.

Simpler version: creation of a resource is associated to an action of blocking the relevant area in the storage. Consumption of a resource unblocks the area (can be written to again). So, create resource - write blob, consume resource - delete blob. Both transactions would have to be artificially balanced by ephemeral resources.

Less simple version: There is a storage application (SA) associated with RSM that creates resources for each SPA and stores them publicly (e.g., in the dedicated area of the RSM storage). Whoever wants to write to the area associated with a particular SPA, consumes the relevant resource. The transaction balances ephemerally, the ephemeral resource contains the blob to be written and an included deletion criterion.

Version 1

SPA logic:

  • create:
    • non-ephemeral: request the blob write, verify the blob, requesting party, other constraints. Can be balanced by both ephemeral resource and non-ephemeral resource (i.e., delete another blob at the same time)
    • ephemeral: can only be created to balance non-ephemeral consumption
  • consume:
    • non-ephemeral: request the blob deletion, verify the requesting party and other constraints
    • ephemeral: can only be created to balance non-ephemeral creation

Advantages: simple and allows for dynamic deletion
Disadvantages: degenerate application design

Version 2

SPA logic:

  • create:
    • non-ephemeral: permitted only for RSM storage providers (how to verify this? - dynamic set of entities), creates a ticket that can be consumed in order to store something in the SPA region
    • ephemeral: verify there is a blob to be written and and a deletion criteria both of which satisfy the corresponding constraints (blob format, identity, etc)
  • consume:
    • non-ephemeral: permitted when the balancing ephemeral resource is valid
    • ephemeral: balances creation of the non-ephemeral ticket

Advantages: more meaningful use of the resource model, the deletion criterion is specified in advance
Disadvantages: the deletion criterion cannot be specified dynamically, need to maintain an external (at least the abstraction of) data structure (ticket pool)

What should the write constraints include?

  • The blob is valid. The logic defines what it means for the blob to be valid: it might verify every field of the blob, some fields of the blob, or allow any blob to be written as long as other conditions are met.

  • The user is authorised to perform the write. The logic might allow only selected set of parties to perform writes to the area. This might be verified with a signature associated with the user’s long-term key. Another forms of verification are possible, but we will assume this one because it is the easiest to assume.

  • Other constraints. The logic might require additional conditions to be met, e.g., only permit a write at a specific time or only permit a write when some other resources are consumed. The latter option might also be used to implement the previous constraints.

How can SPA be used by other applications

All other applications do not do deal with blobs themselves, instead they bind their logics to SPA logics (similar to how intent applications work) and request storing a blob via triggering the corresponding SPA application logic.

For example, an application A wanting to store a blob B as a part of its logic requires a presence of the relevant SPA resource containing blob B in the same action. It must also check the correspondence between the relevant fields, e.g., the identity associated with the request, maybe the blob, etc. I will provide a more concrete example and a diagram in the next part

Consequences

  • encapsulate storage operations in app logics of specific (essential) applications
  • remove deletion_criterion field from app_data

Next posts:

cc @isheff @degregat @cwgoes @Michael

2 Likes

Is there any reason these two types of SPA could not coexist? I assume they would need to write to different segments of storage, but that is fine.

They both provide distinct and useful functionality and tradeoffs, so it would be great if application developers could choose which of them fits their application model best.

They can co-exist, but I would prefer to have a primary one to refer to and explore further, similar to how it works for intent application design

Actually there is another reason for that. Since logics form the paths to blobs, assuming the paths are unique, only one logic can govern a given area. We can have different mechanisms for different areas, but they have to be unique per area and probably not so dynamic

Let’s assume that resource logics governing the writes (from now on: Storage Permission Logic - SPL) partition the blob storage with the criterion of the conditions on which anything can be written/deleted to that part of the storage

only one logic can govern a given area

I want to understand your assumptions behind the blob storage partitioning and what you mean by “area” better.

  1. Does partitioning mean that each blob is governed by one SPL?
  2. Your graphic tells me that SPLs can be nested (i.e., SPL6 and SPL7). How does this work?

I think it must be possible that the same blob is governed by multiple SPLs. E.g., Alice wants a data blob to be stored for 1 year and Bob wants the same blob to be stored until some other resource is consumed.

To achieve this, you’ll probably need a design, where you maintain a set of multiple, independent SPLs per blob. The blob should only be deleted if the set becomes empty.

My assumptions come from the state architecture report. There blobs are the values under keys that contains a resource logic hash, list of sub-labels, and the hash of the blob itself. This implies that only a single logic governs access to any given blob

I think I made it confusing, I didn’t mean they are nested, I only meant that they are not all the same size/shape, which probably only makes sense in the context of a diagram. I think I’ll change the diagram

Alice and Bob here are user applications, not SPLs. Alice and Bob can trigger SPL to do that, I’ll describe how it works in the next post I’m writing. I just posted this to get some feedback earlier.

Also actually Bob and Alice defining different deletion criteria sounds rather complicated, they can just store their blobs separately. But multiple independent parties can access the same blob in the design I propose

1 Like

I see, I haven’t read this one properly yet.

Hmm, “only a single logic governs access to any given blob” and “they can just store their blobs separately” sounds to me like we’ll end up with a lot of duplicated state @isheff if multiple users or apps want the same blob to be stored.

What if Alice and Bob want the same movie file to be stored on the same storage provider node, but with different deletion criteria?
IMO, the movie file should only be stored once, but with a set of deletion criteria. This would allow storage costs to be split and to become cheaper (at least for the period in which the criteria overlap – the calculation of which is the task of the storage service provider).

How does it happen that they want the same blob to be stored? Anyway, I assume the job of the storage is to store whatever the users pay for. If users can reuse another blob, why would they try to create another one?

It should be possible to merge deletion criteria together, but the current proposal doesn’t cover it. It is a good idea to add it though. To be clear, I don’t mean it is too complicated in general, it is just your example seemed like they wanted different things and could just use different blobs.

I think it is also a good idea to differentiate between storing something and using data stored blobs

How would you enforce this? Would you perform the uniqueness check every time the users want to store something? Keep in mind that the blobs might be different, but the data stored in them might be the same. What is considered the same then, same blobs, same data (but what if the other fields are different?) or how do you define equality?

If you compare the blobs by all fields, then it doesn’t stop users from storing a movie twice - the other fields are different. If you compare only the data field, then you might only allow the same value of different semantics to be stored once. E.g., 5-the-age-of-my-dog and 5-number-of-fingers-on-my-left-hand will not be allowed to store at the same time unless they are explicitly annotated, but then you can go deeper and have the same semantic value with different logics that is also not allowed to be stored bc it is “duplicated”

Imo applications should enforce this by providing sensible usage logic, e.g., a movie application stores each movie once if they wish to do it in the RSM storage, which is probably not likely anyway, and then users can access it. If the user wants to store something anyway, it is up to them, but they might as well do it via a dedicated application that makes it much cheaper and unnecessary to do it the other way

I would say that two BLOBs are equal if the hash of the BLOB data is equal, whereas the metadata (e.g., deletion criteria and whatnot) can differ.

This assumes that the user knows that it is already stored on the storage provider and that the deletion criteria are the same.
What if I don’t know that my sister is already storing the same family movie and I want to store it 2 years longer than she does?

I think there are many scenarios in which different entities want to store the same data independently and with different deletion criteria. Since we want to be a DOS replacing browsers, just think about caching of BLOBs being requested by many peers.

In terms of Anoma and apps, I imagine that different, independent parties want to store things like label, value, and logic BLOBs of resources being part of popular applications (or entire, compiled Juvix apps) with different deletion criteria.

Generally, deduplication is essential for data centers. Putting this outside of a storage protocol seems not a good idea. If we have the advantage that we can store data content-addressed and deduplicated, we should make use of it.

Data Deduplication helps storage administrators reduce costs that are associated with duplicated data. Large datasets often have a lot of duplication, which increases the costs of storing the data.

All your examples are about individual users interacting with the system and do not involve apps. If we are talking about “resource netflix” shared account, the application would probably let you know about all the movies you already have, like regular netflix does. If you do not have a shared “resource netflix” account with your sister, you probably want your own copy anyway (otherwise you would share the account with your sister).

I agree with the deletion criteria point and I’ll add it to the SPA functionality.

So if I store the Anoma whitepaper on my google cloud and you store the anoma whitepaper on your separate google cloud account, there is really just one copy of anoma whitepaper stored? If that is the case my views are indeed not up to date

Either way, I don’t define what the state assumptions are. My interpretation of the state architecture report is that there is a single logic that governs the blob’s status and so I base my proposal on that. During our discussions at the HH I was not corrected, so I assumed my interpretation is not wrong. If any part of it is wrong, of course I’ll change the proposal to account for that

1 Like

During our discussions at the HH I was not corrected, so I assumed my interpretation is not wrong. If any part of it is wrong, of course I’ll change the proposal to account for that

Yes, makes sense. I was not part of the discussions at the HHH so I couldn’t bring them up.
However, when I practically implemented the blob storage v0.2 for the EVM protocol adapter, these questions came up since storage is very expensive on the EVM.

So if I store the Anoma whitepaper on my google cloud and you store the anoma whitepaper on your separate google cloud account, there is really just one copy of anoma whitepaper stored? If that is the case my views are indeed not up to date

Apparently, they deduplicate on the block / chunk level (1, 2).

It might be worth also looking at IPFS (which is quite popular in web3) where data is stored content-addressed but without real permissions / deletion criteria.

2 Likes

Deduplication is an implementation detail; as implementors, we’d do it for cases like /SPL1/blobA and /SPL2/blobA but it doesn’t belong in this level of design; you’re still compliant with everything about storage applications if you instead for whatever reason you have store 2 separate copies, or 3 copies, or 8 copies (maybe you don’t trust your RAID controller?). It doesn’t matter at this level that blobA = blobA.

2 Likes