Simple query interface

Transaction functions (when run post-ordering) need a way to query data, such as:

  • the current resource of kind k
  • the value at content-addressed storage location h (h is a hash)
  • possibly historical data or other data

I suggest that we add a very simple interface, where transaction functions are passed a pointer to a query function:

type QueryFunction := (Bytes -> Bool) -> Set Bytes

The query function takes a predicate over data blobs and returns the set of blobs which satisfy that predicate (a similar idea has been previously discussed here, and here). This signature is extremely general, but in practice we can start by supporting only a small, finite set of query functions, e.g.:

  • \x -> hash x == y (query content-addressed storage by hash)
  • \x -> inCurrentCommitments (hash x) && kind x == y (pseudocode - query the current resource with kind y)

The surrounding “resource machine shell” will simply analyze the provided function, see if it matches one of these specific ones, and fail if not. Then we can easily add more sophisticated query options in the future without changing the interface.

Thoughts? @vveiln @Michael @degregat @ray

What do you mean by “the current resource of kind k”? In which context would a transaction function query “current resources”? Are you thinking about, for example, a time-carrier resource as discussed here Time Constraints in Resource Logics - #7 by cwgoes)?
Maybe examples could help me to understand the context. (I tried to look at the nspec repo to read more about the context but I didn’t find the right place. Can someone maybe point me to the relevant pages?)

My intuition is that queries run by controller/block proposers post-ordering must be fast.
Can these queries fail/return nothing so that the transaction becomes invalid? If this is the case, what happens with the transaction and all other, ordered transactions?

1 Like

I mean “the resource which has been created but not yet consumed on this controller of kind k”. A typical example would be a counter resource.

Yes, these queries must be able to return results immediately (synchronously), so queries for unsupported functions will fail immediately, causing the transaction to fail (just as if you read from a non-existent storage location in the EVM, say).

1 Like

So the shell takes a predicate as input and checks that it is one of the hardcoded two (for now)? Do we need that kind of check for other functions that have a more general signature but in practice are limited to a few options (e.g., deletion criterion)? From the report perspective, since right now all functions of the resource machine are described on the core level, i’ll need to describe the connection between the shell and the core if we are adding this (just fyi)

1 Like

Yes.

Yes, it’s the same idea.

1 Like

The concept of “current” being confusing and problematic, we’ve avoided it. A transaction has a concept of “as of my own execution”, which is a lot more specific and actionable; transactions are serializable and there’s an as-if-serialized correct global system state corresponding to each transaction. No “current”; no one knows what this word means and they start doing nontransactional raw reads into private implementation details.

The concept of “fast” being undefined, we’ve avoided it. I know I measure microseconds a lot but this is a sanity check, performance tuning looks very different. Semantics don’t have time bounds outside real-time computing which is not relevant here; implementations have them.

The concept of “synchronous” is more promising, but you don’t want it either. It’s just a simple, normal failure semantic to fail sometimes. (It’s a “soft” fail, which despite being the soft one is the one that fails the whole transaction… a “nontermination” fail, really. The one that admits retries 20 years later.)

Moving on to the actual question, broadly speaking, defining it this way is fine at a high level but I note no one’s going to pass physical functions around and analyze them, Yoneda lemma nonwithstanding; you’re looking at either a domain-specific query language or what amounts to some kind of compiler to same. (These are good things, this isn’t a dismissal as impractical!)

(And you do satisfy a RTOS’s semantics when you single-step through it nanosecond by nanosecond in your simluator. Not even that has a time bound. What you do have at a high level is the concept that some routes are foreclosed because they e.g. blow up combinatorially and can never be made efficient without a research breakthrough. But this looks, again, nothing like performance tuning…)

Agreed, I did not intend “current” to be read as a precise technical descriptor.

By “synchronous” I just means “returns results immediately, from the perspective of the caller” - I think we’re on the same page.

The structure required for analysis from the perspective of indexing is probably not entirely dissimilar to a circuit - namely, one wants the data dependencies to be explicit - so I’m not sure that this is entirely infeasible, but we can start with a simple DSL.

I had a conversation with @isheff where he mentioned that, if we know the keys that function can read from and write to, parallelization is easy.
If they are not, but instead are data dependent, that requires evaluating them sequentially.
Maybe we can find ways to segment general queries, which are potentially data dependent, from more restricted queries, where the set of keys is know beforehand, s.t. users could be free to choose the degrees of expressiveness they need, including the cost tradeoffs that come with it.

In the execution engine spec as written, we assume the state machine’s state is divided into key value pairs, and there is a notion of the “current” value of each key. While the state architecture ART is not done, keys for the Anoma state machine include:

  • nullifier identifiers (the state of which is, I think true or false)
    • I have some ideas about structured indices for transparent nullifiers, but that’s another conversation.
  • label prefixes (the state of which is all resources in blob storage with a label with this prefix: this is a hierarchical index)
  • blob hashes (the state of which is either a blob with that hash, or null)

Dividing state this way makes concurrent execution easy if you know the keys your transaction is going to read or write.

A “general purpose predicate” is too broad an interface to build any concurrency on: each transaction candidate could affect the same data as every other transaction candidate.

Transaction candidates should not be able to query state “in the past.” The replicated state machine does have a notion of “current,” and the ability for “past” states to affect transaction candidates breaks the state machine abstraction.

1 Like

another is

  • commitment tree roots; you have to be able to get them somehow to use the resource machine (‘somehow’ being a read-only transaction)

I’m running two-stage transaction candidates; first they emit the key subspaces they want to read/write (they are not allowed to read at this stage, of course); next they compute and may read, the transaction having a failure result if it violates what it emitted earlier.

I don’t rely on this especially (shielded commitment trees might? the way i do it transparently feels like cheating) but why would you want to ban it / what does it possibly break, given that all that state was certainly readable sometime, and the “current state” is the sum over all of it, i.e. past state is always affecting every transaction candidate definitionally.

the “current” state is “state as of my own execution”, defined for every transaction candidate; conceptually every key has a “current” value as of then, though you can only see the ones you pledged to be able to read in the first phase because it’s concurrent.

If you define the “current state” as the “sum of all the states ever,” then you have 2 problems:

    1. you have 2 definitions of “the state at a time.” There is “most recent values of each key” and “all the values each key has ever had”
    1. you can never delete anything. We absolutely do want to be able to delete old stuff.

All replicas must agree on the outcome of all transaction candidates: we can’t have some of them “failing” a read of “old” state and some of them “succeeding,” so replicas have to decide what stuff is readable as of what time, meaning they are deleting it from the replicated state.
I am OK with allowing the replicated state machine to keep state around for a long time (or even forever), but I am not ok with mandating that it be kept forever.

The “keys” described in the execution engine spec should not be “part of the thing you need to look up current state,” where “the whole thing you need to look up current state” is key x timestamp. They should be the entirety of what you need to look up current state.

For example, if you want to preserve the ability to look up all the resources with Resource Logic L, label prefix P, that had been created as of time T, then the “Execution Engine Key” should be (L,P,T). These keys should be divided somehow across the shards, and data corresponding to each must be stored somehow.

They decide by following the deletion criterion for that piece of state. (When satisfied the state might or might not be deleted physically, but the state machine treats it as though it is.) Which could be “false” (forever) or “true” (i wasted gas writing something that can never be read) or “timestamp > N” or whatever is supported; at any rate it’s replicable. (When do we delete deletion criteria? Who knows, this spec is just a paragraph long right now)

“The current state” is parameterized by “what transaction am I?”; it’s not even required that the system know an ostensible total order yet when it’s doing that read. But as a transaction you’re given that for free. A transaction never sees final timestamps

Yes, but a transaction has access to its own timestamp. It must be able to, for example, read all blobs with resource logic L and label prefix P stored before time of this transaction. The transaction therefore requires (at least implicit) access to its own timestamp when defining its queries.

Yes, this is part of what defines “the current state.”

That’s true; “the tx hash” is a logical timestamp, just not a final totally-ordered one, and it has that.

It will be totally ordered, and the execution engine can wait to learn this total order before execution. For latency reasons, we should try to execute using as little ordering information as possible, so we can execute some things before total order information is available.

The no-op tx, which does no reads or writes, can just instantly finish execution and imposes no waiting; a read-only transaction has to wait but imposes no waiting, and a write-only transaction can instantly finish execution but imposes waiting on others. (For example.)

For more details on running transactions before we have total ordering: Shard - Anoma Specification