Taxonomies of Available Data, or: consequences of Information Flow Control for application developers

In a discussion with @Michael yesterday it became clear to me that we should explicate what is meant with “observer dependence” and things being “contingent on data availability”.

It feels to me that a useful description of the possible case distinctions would be downstream of a (partial) specification of the information flow control system as we roughly envision it.

On the one hand, we might want to enumerate the potential case distinctions the design or implementation currently enables “close to the metal”.

On the other hand, we should revisit the model of IFC drafted here to check if it can map to all (or at least the desired) settings.

Somewhere in the middle we probably want to describe some patterns in respect to how they will appear “in the wild” and be used for actual application implementation, ideally linking their explanations to expressions in the abstract model, as well as the concrete primitives needed to implement them.

An incomplete enumeration draft could look like this:

  • Fully transparent TX: All plaintext for all resources available.
  • Fully transparent TX: Some plaintext available, rest needs to be queried?
  • Mixed transparent/shielded TX: Plaintext for transparent resources available, proofs attached to nullifiers.
  • Fully shielded TX: All plaintext for all resources available, so proofs can be computed.

(Note: I’m not too familiar with the workings of shielded TXs and the ZK components, so the above might make absolutely no sense.)

Depending on the application, the developers will need to implement code paths for each case, even if we provide more developer friendly abstractions for these patterns, since they differ not only in implementation details, but semantics.

We should also make explicit in the interfaces and documentation of juvix that local DA can and should be used to populate, e.g. evaluation contexts for predicates over TXs, and likely provide utility functions to improve the quality of life for developers.

Some examples that arose yesterday:


nullifier ↔ resource correspondence

A TX struct contains nullifiers for consumed resources, as well as commitments for created resources.

Since commitments are the hash of the resource the correspondence of commitment → resource is clear, and the data can be retrieved, should it be available to the user making the request.

For nullifiers, that is less clear: If the resource and the nullifier key are available to a user, they can compute the nullifier.
Are there any cases where users are supposed to have access to nullifiers and resources, but not their keys? (@vveiln)
If that is the case, do we want to provide a map that ties nullifiers to resources?

mixed shielded / transparent TXs

In case only the nullifiers are available to an entity, it will not be able to compute any proofs, but if the proofs are shipped with them, they could be plugged into the validation of the TX.

For example, I get plaintext for some resources, but none for others. Then I can compute proofs for all the resource logics of the plaintext available ones, and fill in some gaps using nullifiers and (succinct and/or zk) proofs provided with them.

How are proofs and nullifiers tied together? What information can be disclosed about the nullifiers and proofs, i.e. could they contain hints about the resource kind, s.t. tranparent resource logics can use them during evaluation?


I assume that some people thought about this already, but I am unaware of the current state of implementation or documentation of these concepts, so if thi, or parts of it are closed problems, I’m happy to just be linked to the artifacts and help unify the pieces into a coherent and accessible document.

Feedback by @mariari @Michael @vveiln @xuyang @paulcadman @cwgoes would be welcome

3 Likes

I just want to clarify that the post you linked is only talking about IFC for resource machine transactions. IFC is a broad concept and will be applied differently in different places in the protocol. It might be possible to come up with a high-level definition that describes IFC across the whole system that we can then implement in different components, but I haven’t done this yet (and afaik no one else has). In particular, the definition of IFC for resource machine transactions in the linked post says nothing about IFC for data availability - which concerns what storage requests a node storing a particular piece of data should or should not respond to - the definition in the linked post concerns only which other nodes a particular node should or should not forward a transaction to in the counterparty discovery process.

I don’t follow this. Developers should definitely not need to implement code paths for different cases of data being available, at least in terms of transaction data as you describe here. The application definition is independent of data availability - data available to a particular observer determines what that observer can see and do (e.g. in terms of computing proofs), but application definitions are independent of any particular observer. Developers may sometimes need to specify in application definitions (e.g. resource logics) what data should be stored by a particular party, and maybe even what IFC policies should be applied to that data - but this is still at the application definition level, not at the level of the view of a particular observer. What semantics are you referring to?

It feels to me like the mechanism should be the same for forwarding TXs during the counterparty discovery process as well as managing data retrieval at other times: E.g. if these TXs can be forwarded to these entities, including the plaintext s.t. they can compute proofs, these entities should also be able to do that before or after counterparty discovery.

I might be missing something and the latter case needs some more functionality, but I think at least for TXs, this is a good start for DA IFC.

It sounds to me that the app devs must make specific assumptions explicitly and then structure the application accordingly?
For example for kudo balance calculation, the app dev should never assume any amount of data being available and just calculate a view of what can be accessed.

Another example regarding this setting:
If some function has strict data dependencies, an app dev would specify storage requirements and IFC policies, and if the assumptions do not hold for a specific observer, it reduces to error handling?

I understand better now what you mean with, “observer dependent things should not happen at application level”, but it sounds like the app dev would still need to make choices (at least ofr defaults, which users could still change) on how a node should behave, i.e. how an error is handled.

After processing your answer I think the semantics for the different settings of data (un)availability can be treated by making assumptions of data dependencies clear and doing error handling (does that make sense: @Michael @paulcadman).

If I understand correctly, the transparent/shielded/mixed TX settings should really not be relevant to the app devs, but how the mixing would work is still unclear to me.
Do I just get a nullifier and a proof to plug in the slots of transparent predicates? Does the RM report talk explicitly about that? If so, can someone point me to the section (@vveiln ?). If not, do we consider this in scope of the RM, or a detail of node implementation?

2 Likes

How can we retrieve the resource plaintext associated with a commitment in practice?

In Constraint linearity in the resource machine - #36 by vveiln @vveiln mentioned the specs:

Resource Logic Public Inputs:

  • nfs \subseteq nfs_{tx}
  • cms \subseteq cms_{tx}
  • tag: \mathbb{F}_{tag} — identifies the resources being checked
  • extra \subseteq tx.extra

Resource Logic Private Inputs:

  • input resources corresponding to the elements of nfs
  • output resource corresponding to the elements of cms
  • custom inputs

Will these input resource plaintexts being part of the private inputs be in

  • the key-value map of the node with the cm as the lookup key or
  • the key-value map in the tx extra data with the cm as the lookup key or
  • a set in the extra data (which would mean that we have to iterate brute-force over them to find the matching one)
    ?

In the v2 specs, we at least have the tag now as a logic function public input, which is either the commitment cm or nullifier nf, depending on if the resource is created or consumed according to @vveiln (see her post)

But, as above and as @degregat writes, it is unclear how one can connect nullifiers with output resource plaintext outside the tag in case we need to (that I didn’t encounter yet) since we also lack the nullifier keys. Maybe a resource logic would want to check that certain properties of other the consumed resources hold, e.g., that none has a quantity > 10.

1 Like

I can’t think of such cases. Afaik the only parties that have to see the plaintext are the parties who create the proofs associated with the resource. When the resource is created, the prover knows the plaintext but not the nullifier key or nullifier. When the resource is consumed, the prover knows the resource plaintext and knows both the nullifier key and nullifier. Both compliance proofs and logic proofs require checking the validity of the commitments/nullifiers, which implies the knowledge of the nullifier key to compute the nullifier.

One of the reasons why we have both nullifier key and nullifier instead of having a commitment to a nullifier in the resource is that the nullifier is computed from a resource plaintext (specifically, in taiga the resource commitment is hashed in the nullifier), and if a commitment to the nullifier is a part of the resource plaintext, there is a cycle. In Taiga we want the nullifier be dependent on the commitment, but I’m not sure if we care about that in the transparent case. Yet, to acommodate both cases, we want to keep these things decoupled

I don’t understand what you mean here

1 Like

Hmm, I see what you mean, but there is definitely some nuance here - if I send my intent to a solver, I don’t want them to store the data and respond to requests to retrieve it forever, most likely - maybe I want to instruct them to do that (for specified requesters authorized by the predicate) for a little while, yes, but there is some distinction here (and maybe one which indicates that we really do need a more general framework with which we can express counterparty discovery related IFC as a specific case).

That’s true, but I’m not sure how the app would “assume” data being available anyways, since the read interface is just a projection function over state - apps do not specify how to retrieve data, or who to try to retrieve data from, merely what data should be retrieved, declaratively - it’s a separate choice of the user, or the GUI acting on their behalf (probably with some configuration and idea of who has what data, who has made what service commitments, etc.) to make those decisions. Also see this discussion, which is very relevant. Maybe the GUI or “application user interface” should not assume that all data is available, that I would agree with, so developers must consider this possibility in that sense. We need some clearer terminology for the distinction between “application definition” and “application user interface”, I think. I’ve been using “application” to mean “application definition”, but I wonder now if this might be too in conflict with the intuitive expectation, and maybe we should always explicitly disambiguate.

Yes, the application user interface or GUI (whatever the user is using to help them craft intents, view state, etc.) would need to handle such exceptional cases, which could be considered to be errors. Furthermore, thanks to explicit service commitments, we should be able to provide specific information in every case as to which expectation was not in fact satisfied (e.g. I expected party P to store data D, and party P did not respond / did not have the data / etc.)

1 Like