we should have shippable signsfor proofs. the basic problem is that A would like to prove to B that X said something, but what it in fact has is a commitment from Y to that thing, where Y signs for X. there are two problems with making this implicit:

there is no global signsfor database; every node will have its own signsfor database (which may change over time); a node cannot remember every piece of signsfor evidence it sees, because that would be a trivial dos attack. so it may be that A has evidence showing that Y signs for X, but B does not

even assuming the individual pieces of evidence are shared, signs-for is transitive, and computing the transitive relation is equivalent to graph reachability, which is O(hard). the onus therefore should not be on B to compute it; if A wants to convince B of something, then it should be A’s responsibility to do the hard work

therefore: A should be able to find and share the specific path through its signsfor db that shows what it wants to show, producing a proof that B can straightforwardly verify in linear time

compositional identities complicate matters slightly. it can probably be taken for granted that X signs for X \lor Y. but things get more interesting when you have to mix cryptography with logic; what about, say, showing that P \land Q signs for X \land Y \land Z from the fact that P signs for X \land Y, and Q signs for Y \land Z? i don’t think it’s important to be able to infer that sort of relationship automatically right now (if ever—i’m not sure how to do it and i’m not sure what the point would be), but we should make sure the proof format can handle everything

probably something like the following? define some primitive inference rules, and the proof is a set of cryptographic signs-for proofs plus a sequence of directed rewrites (each justified by an inference rule and the cryptographic proofs) which takes you from the ‘source’ id to the ‘target’. (maybe we should do cnf/dnf implicitly, maybe that simplifies the proof encoding or something? no because renormalising after each step is probably O(bad))

Yes, I see. I agree with the goal. I think we can start with simply a fully-connected path from source to target, and perhaps consider a more sophisticated inference-rule/rewrite system later.

The other aspect of this is that it would be helpful for A to know which signs-for relationships B knows (specifically, which signs-for proofs B has already verified), so that A doesn’t need to re-send and B doesn’t need to re-verify those proofs (A can still do the search and reference them, but only by hash). This is a situation we face in many cases - a sort of distributed cache awareness problem - and it seems like what we might want to sometimes do is have A and B (if A and B frequently interact) track parts of their interaction history, such that A can store a bit for pieces of data that A believes B knows (and in the case of proofs, perhaps has verified), and not send that data (B can always request it if B no longer has it). This carries some overhead cost which I have not fully analyzed. Curious what you think of this problem.

so reasoning through: for a given node N that i know about and resource R (by resource i just mean data not rm resource) that i know about, i want a function f: Node → Resource → 2 returning a guess as to whether we think N knows R already or not. (or maybe it should return a confidence interval or distribution or something; whatever.) the question is: how accurate of an f can we make? (or: what is the global tradeoff space of possible fs?) i would expect that in general the accuracy of f should drop off as R becomes less popular, and as N becomes someone we peer with less. and i would expect that the accuracy drops off very fast, which seems like bad news. but! both peer communication and resource popularity likely follow a pareto distribution, i.e., there’s a small number of very popular resources, and most of our communication is with the same few peers. so probably? we can get somewhere decent with a bloom-filter-alike for the top x% peers and the top x% resources, and use that for speculative sending. (or maybe more like—pick a very small x x’ and a small-medium y y’, and have one filter for each top x% peers guessing whether they have just the top y% resources, and for each top x’% resources guessing whether they’re known by just top y’% peers.)

but this is all reasoning from first principles and it is entirely possible that practice refutes any or every single thing i conjectured

also, just ignoring this should ideally cost no more than one round trip, so i am not sure if it is worth the potentially wasted traffic?

Yeah, that sounds roughly right. If we have an f that returns a probability in the range [0, 1] (being 0 e.g. if the data was just created and 1 if we know that the other node has promised to store the data), we can use the bandwidth - latency preference tradeoff at runtime to determine whether to include data which might be unknown to the recipient or not, maximizing for the node operator’s preference in expectation. If the node operator for whatever reason wants to minimize worst-case latency, we can just send all the data (at least all that we have).