Table of Contents
(sorry, the links won’t work)
- First, get “scry” out of the way
- Keyspaces
- Timestamps
- The top owner
- Interpreting ownership information
- Timestamp ownership
- Keys are not necessarily dumb stores
- The (pen)ultimate segment
- Local values
First, get “scry” out of the way
Scry is just a low-level referentially transparent read operation. I.e., it’s
- low-level: that is, it has an opcode at the bottom layer;
- referentially transparent: that is, a scry, if it evaluates to anything at all, evaluates to the same value for the same key always and everywhere
Therefore, the VM can cache it, for example; if you learn a value, it’s valid for its key. When you need a new value, you have a new key as well (usually; there’s some twisty logic in how read-only transactions fetch things into the local state, explained below). Moreover, every program, if it terminates, terminates to the same value every time. (Values associated with a key are free to go unavailable; if so, a program which unconditionally reads their key crashes there. It’s free to branch on first asking if a read would succeed.)
Those are the properties we mean when we say ‘scry’.
Keyspaces
However, an unstated assumption I keep making, which I don’t believe is controversial but which deserves to be stated, is that the keys are segmented, or equivalently that they’re lists. Conventionally I write the segments separated with /
as e.g. /a/b/c
, like a filesystem path.
Why like a filesystem path? Because keyspaces are implicitly hierarchical. That is, the order of segments matters; it’s not a set of attributes which determine a key, but a list of (identifiers? practically speaking, they’re strings).
That’s not to say that I endorse hierarchical databases generally. Modeling data as hierarchical is somewhat fraught for various reasons; associations between data follow a lot of different logical predicates that can’t be reduced to “inclusion-within”. Relational database curmudgeon C. J. Date characterizes this as introducing an asymmetry relative to the sorts of queries one wants to support; some of those become arbitrarily more difficult. (I discuss below the concept of gonzo-embedding an actual query language inside a key, which is different because it’s a language. Though we may come up with a better idea, too.)
What’s hierarchical about keys and keyspaces is the ownership information they represent, which I hope I can define reasonably herein.
Timestamps
Before discussing keyspaces proper, I should introduce the also uncontroversial idea of timestamps; these can be as simple as sequence numbers. At least one segment of the key being a timestamp is the way in which “referential transparency” and “values are updated as the system state evolves” are reconciled; older timestamps might remain readable, or go unavailable, as a matter of discretion. Different applications will have different needs, and different operators will be willing to commit to different levels of service.
The top owner
Even if it’s often left implicit, all Anoma keys should start with /anoma
(if we write this as a list of strings it’s [“anoma”], for reference). This probably takes up more than 5 bytes; when we account for all these bytes into perpetuity, are they well-spent on a useful concept, or are we just repeating the old mistake that e.g. IRC made when it made its most common command PRIVMSG
rather than MSG
?
I believe that we are; for one thing, if it’s ever desirable to interoperate with a neighbor keyspace, it requires less mangling, but this is ultimately kind of a silly reason. The real reason is that this expresses the first piece of ownership information about a key: that it’s an Anoma key.
Interpreting ownership information
Or rather, interpretation from ownership information. That is, ownership information is about setting the rules of interpretation for the rest of the key.
So inside the keyspace /anoma
, keys are interpreted following anoma
rules. That’s what it means for them to be “owned”. In fact, we can stop here and defer that to a future “Rules of the Anoma Keyspace” document (I won’t, though.)
But there’s a problem that arises quickly: assume for the sake of argument, so that I can use a concrete example (you may not nitpick it) that an “Anoma consensus” is a thing which can exist, and that it can store “blobs”, hash-addressed pieces of arbitrary data. (We can elide timestamps by assuming our hash function never collides, which is usually what people assume when they use hash functions so it’s not too far-fetched.)
The obvious key schema that comes to mind is something like /anoma/[consensus id]/blob/[hash]
. But isn’t it Anoma, not this individual Anoma consensus, which defines the concept of “blob storage”? Shouldn’t it be more like /anoma/blob/[consensus id]/[hash]
if I’m putting forth a theory of hierarchical keys as containing hierarchical ownership information where each prefix defines how its suffix is interpreted?
The glib but unsatisfying answer is “anoma
defines how everything below it is interpreted, and if it wants to say it uses this order, it can”.
The answer I actually want to give is “ownership information also includes the more ordinary sense of ‘ownership’, and it’s Consensus A’s blob store”, but with a dash of “and if there’s any version heterogeneity between consensuses, i.e. they speak compatible but different Anoma protocols, this is where we want it to live”. That is, we’re using Consensus A’s blob store, meaning the exact version of “blob store” Consensus A implements, and not any other broadly compatible ones. (Consensus A migrated to a post-quantum gigahash earlier than others, or whatever.)
But there is some ambiguity about what hierarchies are “correct”. I think this is probably an inherently ambiguous problem where decisions just have to be made as correctly as possible.
Timestamp ownership
One thing that’s not ambiguous, or shouldn’t be, is where to put timestamp segments if they’re present (content-addressable data can assume hash noncollision in lieu of timestamping, as shown in the example above). Timestamp segments go right after whoever sets the clock, or assigns the sequence numbers or whatever form the timestamp takes.
So it’s fairly clear that we can have something like /anoma/[consensus id]/[height]/rm/cm-root
as a key. This is if the consensus has one sequence number over all storage; if the individual RM maintains the sequence number itself (not contemplated in current design, but one can imagine it) then it would instead be /anoma/[consensus id]/rm/[height]/cm-root
. In either case the key is to a commitment root, which is clearly under an RM, and parametrized by a timestamp (since new roots happen every time new commitments are committed).
Keys are not necessarily dumb stores
The interpretation of a key suffix imposed by its key prefix does not have to be “read from a database”. A commitment root is again a good example. In real life, we probably do compute and store these and do a read to answer queries about them. But we could also, instead, only have stored the commitment tree, and do a computation to hash everything up and find the root. Whatever computation we do is still going to need to obey timestamping rules, since it’s supposed to be referentially transparent. On the other hand, it’s referentially transparent - i.e., it can be cached after being computed once. (And e.g. the cm-root computation can know “if we’re between root updates, get the cached one”.)
This also means that a query DSL can be embedded into a keyspace. I don’t have a toy one of those ready to go but a silly example might be /anoma/[consensus id]/query/blob-starts-with/abc077
to enumerate all blobs whose hash starts with abc077
. Naturally, this has to be a computation. (A cacheable one, and even a tunably cacheable one; blob-starts-with/a
will change a lot more frequently than blob-starts-with/abc077fe17
.)
The (pen)ultimate segment
Every key must come to an end; when it does, it presumably identifies some value associated with it. The last segment can be something like cm-root
(the value is the commitment root) or [hash]
(the value is the blob with this hash); what makes it last is that it owns exactly one piece of data. (Can it be set-valued? No, but its value can be a set, of course.)
But an interesting possibility is that the last segment might be not the last segment; something which can go after it is representation information. I.e., take /anoma/[consensus id]/[height]/rm/cm-set
. This key gives us the value of the commitment set. But what is the value of the commitment set? We’ve come up with, at some point, an arbitrary choice of data type to use.
However, the key can be extended to e.g. /anoma/[...]/cm-set/json
. Having identified the value, we further identify how we wish to read the value; in this case, as json (“json” isn’t very specific, of course, but we can say the json representation of a commitment set is a json array of strings, or something. If we wanted something other than json, we could say that.
This is a little bit distinct from the (currently unused) mystery type argument to scry itself; that argument is in the context of runtime type checking using an actual language’s actual typechecker. (Look for Terence and I to make this real in forthcoming work.) I carelessly called it a ‘type’ in an earlier draft, but it’s not quite that; thinking of programming language types, if we read the json commitment set, it could be “string” because json is strings, or we could be in typescript and actually say “array of strings”. It could be either more or less specific; it’s really a different dimension than ‘type’.
Local values
Some values aren’t independent values, but local caches of values we’ve read. This is the secret backing mechanism whenever we query against apparently-timestampless, “latest value” keys: a local cache, where timestamps can be elided (just say it’s the system time), populated by read-only transactions reading keys as of whenever they’re able to be executed. This sort of operation is a popular one; before doing expensive local computation, we often want to sync some best-effort fresh state from online.