Intent Languages: CSP Formulation

degregat · July 27, 2023, 4:32pm

It seems to me the natural formulation of Intent solving is in the form of Constraint Satisfaction Problems, since we’re doing counterparty discovery for the purpose of optimizing matching in respect to what any entity wants to give and get, i.e. finding good matches under constraints.

Constraint Satisfaction Problems (CSP)

One way to formulate an Intent is as a Constraint Satisfaction Problem.

It is a very natural formulation, since we can encode Predicates over Resources as Constraints over Variables. This approach enables us to use results from a rich research history and it has interesting special cases, e.g. Linear Programming.

CSP Definition

A CSP consists of the following sets:

X = \{X_1, \ldots,X_n\} its Variables
D = \{D_1, \ldots, D_n\} the Domains of Values for its Variables
C = \{C_1, \ldots, C_m\} its Constraints

Any variable X_i can take values from its corresponding domain D_i.

Constraints are of the form C_j = \langle t_j, R_j \rangle, with t_j being a subset of X of size k and R_j a k-ary relation over their corresponding domains.

An evaluation of variables is a mapping from a subset of X_i to values from their D_i. A constraint C_j is satisfied by an evaluation if the values of the variables t_j satisfy the relation R_j.

An evaluation is consistent if it satisfies all constraints and complete if t_j = X. An evaluation that is consistent and complete is called a solution which solves the CSP.

Correspondence of Terminology

Turning a Transaction containing a set of Partial Transactions, which in turn contain input and output Resources into a CSP gives us the following correspondences of Terms:

Variable: A Position for a Resource in a Predicate.
Domain: Restrictions on which Resources can fill a Position. Restrictions are: Resource Type, Input or Output Resource in a PTX, ephemeral or non-ephemeral Resource.
Constraint: Each Resource containts a Resource Predicate which defines the required relations between Resources inhabiting Positions.
Evaluation: Once Positions are inhabited with Resources which fulfil the Restrictions, evaluate all their Predicates.
Solution: All Predicates from all the above Resources evaluate to True.

Example: The three coloring Problem

In a fully connected graph G with vertices v \in V, edges e \in E each vertex is supposed to be painted in a color c \in \{Blue, Red, Green\} with no v_1, v_2 sharing an edge having the same color.

Formulated as a CSP we get:

X = \{v_1, \ldots, v_n\} as Variables
D = \{Blue, Red, Green\} as Domains for each Variable
C = \{C_i = \langle \{v_i, v_j\}, v_i \neq v_j \rangle \}, a set of Constraints, requiring pairwise inequality for neighboring vertices

Enter Intents

To close the bridge to Intents, lets assume:

A = \{A_1, \ldots, A_n \} is a set of Agents who want to collaboratively color a graph.
Resources exist that encode specific Vertices via the Resource Type and specific colors via the Resource Value.
All Resources are non-ephemeral
The required Vertices are Input Resources
Output Resources are the same vertices with signatures proving they were part of a valid three coloring.
Every Agent owns some of these Resources
Every Agent wants to spend exactly one Resource. This is an additional Constraint, to be encoded in the Resource Predicate along with the inequality.

Vertex ownership example:

A_1 owns v_1^{Blue}, v_2^{Red}
A_2 owns v_1^{Red}, v_1^{Green}, v_1^{Blue}
A_3 owns v_3^{Blue}

The agents would submit this to a solver, which would find the solution of (A_1, v_2^{Red}), (A_2, v_1^{Green}), (A_3, v_3^{Blue}) by performing search over possible evaluations.

degregat · July 27, 2023, 4:38pm

Some further research notes and literature pointers regarding complexity theoretic considerations, and implementation of solvers, amongst others:

Tractability

There is a lot of research on how to determine whether instances of CSPs are tractable (in PTIME), or not (NP-Complete). The two main characterizations are in regard to analyzing restrictions on “Constraint Languages” as well as the “Structure induced by Constraints on the Variables” (Section 1.3. of Structure of Tractable CSPs)

Languages

Theorem 5 in Structure of Tractable CSPs lists conditions for which is has been proven that an Instance fulfilling one of them is tractable.

TODO: List example criterea most relevant to us.

Constraint Structure

Theorem 13 in Structure of Tractable CSPs states that instances of “bounded tree width modulo homomorphic equivalence” are tractable.

TODO: Elaborate

Solving

Deterministic Algorithms

Wherever feasible, we should use deterministic AND reproducible algorithms for improving estimates, ease of verifiying non-defection, etc.

In theory, we can always turn non-deterministic algorithms deterministic by using a PRF with fixed/known seed, but this might not always be possible in practice, depending on the implementation, or desirable.

Proving that truly non-deterministic solving did not intentionally lead to suboptimal/adversarial outcomes seems hard.

Linear Programming

Recent results show that Linear Program Solvers exist which have complexity equivalent to matrix multiplication.

TODO: Can we classify which Constraint Languages lead to instaces amenable to be solved as an LP?

SAT Transformations

Generally, CSP formulations can be transformed into SAT formulations, e.g. to be solved by Z3.

TODO: Investigate overhead

SAT v CSP on identical problems

SAT vs CSP: a commentary

Section 3.2. Lazy Clause generation

Heuristics

Graph Neural Networks

fast global search heuristics (combine with robustness for iterated solving)

Ajmaq · August 10, 2023, 12:05pm

Very interesting! I’ve just stumbled upon “Locally Linear Embedding”. Do you think it would help for solving?

Locally Linear Embedding (LLE) is a nonlinear dimensionality reduction technique used to analyze high-dimensional data. It was first introduced by Sam T. Roweis and Lawrence K. Saul in 2000. LLE works by finding a low-dimensional representation of the data that preserves the local relationships between neighboring data points.

The idea behind LLE is that the structure of the data can be approximated by a set of linear relationships between neighboring points. By preserving these relationships, LLE is able to reduce the dimensionality of the data while maintaining its local structure.

LLE has been used in a variety of applications, including image recognition, speech processing, and bioinformatics. It has also been extended to handle time-series data and to incorporate prior knowledge about the structure of the data.

degregat · August 10, 2023, 3:07pm

As far as I understand this (and similar approaches, e.g. most from Machine Learning) could be used for Instances where reaching (exact) solutions would be expensive (or impossible) and getting an approximate or cheaper solution would be a good tradeoff. These Instances would likely be NP-complete or large ones out of PTIME. I assume in these cases solvers will do “best effort” solving.

I think for the time being, we might want to focus on classifying which use cases can be mapped to Constraint Languages which lead to instances in PTIME , which should have predictable complexity.
It feels like the incentive structures for these might be easier to reason about since the amount of effort to find a solution will be known at the time of decision which Intents are part of an Instance that is to be solved.

Once we’ve got some theory for these “well behaved” cases, I assume tackling approximations will be easier.

AHart · August 11, 2023, 8:07am

On classifying tractable CSPs

The dichotomy theorem

In my conversation with @degregat, we discussed classifying CSPs by complexity. In his post, he points to the paper;

“Structure of Tractable CSPs”

While it was a good paper when it was released, that was in 2006, and the field has advanced considerably since then. In 2017, the full dichotomy theorem for finite-domain CPS was proven in

I don’t recommend reading either of those papers; I’ll give a better follow-up article in a bit. I will summarize the result though. It’s about CSP languages. We can describe these as a kind of algebra. Every CSP language contains a finite set of N-ary relations. It is closed under relational composition, that is, under conjunction of relations that may share variables. Such an algebra is called a “relational clone”.

Each CSP language has a corresponding algebra of “polymorphisms”. A polymorphism of a CSP is a function that preserves all relations. That is, given a relation, R(X, Y), in the CSP, a function, f, conserves R if R(a, b) implies R(f(a), f(b)). If f satisfies the analog of this for all n-ary relations within the CSP, it’s a polymorphism.

This has a connection to programming language theory through Reynold’s theory of relational parametricity. There, polymorphism (in the programming sense), is defined through a function preserving all relations. This is the same concept, but in a higher-order, infinite domain, while we’re dealing with a first-order, finite domain. At first order, the only parametrically polymorphic functions are the projection functions (including identity as a special case). As such, projections are always polymorphisms for any CSP. Additionally, with some thought, we can conclude that, if f and g preserve a relation, then so does their composition. Beyond this, I don’t think anyone’s studied this connection with any seriousness.

This gets us an algebra of functions that includes all the projection functions and is closed under composition. This is called a “functional clone”. Incidentally, such a thing is usually just called a “clone”, and such function algebras are the domain of clone theory (see, e.g., “Function algebras on finite sets: A basic course on many-valued logic and clone theory”). Relational clones don’t seem widely studied.

Modern approaches to CSP tractability focus on the structure of this polymorphism algebra. The low-resolution version of the dichotomy theorem states that

Any finite-domain CSP is either NP-hard or all queries are P-time solvable

A higher-resolution version of this theorem would state

Any finite-domain CSP whose polymorphism algebra is essentially a clone of projections (in other words, whose poly. alg. is trivial) is NP-hard, otherwise, all queries are P-time solvable

Note that I’m hiding some technical details by using the word “essentially”; something can be “essentially just a bunch of projections” without literally being that, see the unification paper I will soon cite for details.

This has some significant consequences in that it shows there are no finite-domain CSPs that are between P and NP-hard; there are no quasipolynomial time CSPs, etc. Also, tractability can be framed by studying the algebra of polymorphisms. All the dichotomy results in the original survey article are special cases of this.

For a survey of the general approach, see

These, unfortunately, do not go into the proof of the dichotomy theorem. The most cogent presentation is,

Unifying the Three Algebraic Approaches to the CSP via Minimal Taylor Algebras

The original proofs used different, hard-to-grok methods; the goal of that paper is to unify them under a single framework through the notion of a Minimal Taylor Algebra. A Taylor Algebra, A, is an algebra such that no quotient of a subalgebra of A is a two-element algebra whose every operation is a projection. It’s a way of defining a nontrivial clone in full technicality.

Beyond CSP

There have been many directions of work trying to apply the so-called “Algebraic Approach” to CSPs to other domains.

The most active is promise CSPs. These are problems where we assume we have a solution to a problem in a different CSP language and ask if this can help us solve problems in a different CSP faster. This doesn’t seem so relevant to our goals, but it should be looked at in more detail to be sure.

An Invitation to the Promise Constraint Satisfaction Problem

A different domain that’s probably more relevant is quantified CSP. This allows universal, and further, alternating quantifiers as part of a query. This can be seen as a sort of function-synthesis problem through the notion of Skolemization. It’s known that full quantified boolean relations are PSpace complete. There was a trichotomy conjecture asserting that all finite domain QCSP problems are either PSpace-hard, NP-hard, or P-time tractable, but this has since been disproven. See

QCSP monsters and the demise of the Chen Conjecture

Full classification of QCSPs is still an open problem. Some large classes are classified, see for example

There is also some work on infinite domain CSPs. This is a bit vague, so additional restrictions are added. Common ones are being numerical domain relations, being omega-categorical, or being equipped with an oligomorphic permutation group.

Complexity Classification in Infinite-Domain Constraint Satisfaction

Another relevant area is parameterized complexity of CSPs. This seems understudied, but the idea is to loosen the notion of polymorphism to that of a partial polymorphism that preserves some but not all relations of the CSP. The details of this can allow one to classify queries by difficulty, allowing a worst-case untractable CSP to have a large collection of tractable queries. See:

Fine-grained complexity of constraint satisfaction problems through partial polymorphisms: A survey

This area, as a whole, seems underdeveloped.

Representing CSPs

CSPs themselves are often described in highly linguistic terms, mentioning statements, variables, etc. This is not necessarily the best thing for formalization as dealing with nominal elements like variable naming, if unnecessary, would distract from the point. There are two main approaches to formulating CSPs in a neat way.

Graph homomorphism: Given a finite domain CSP, we can describe a query through a labeled hypergraph. For example,
R(X, Y), S(Z, X, W), Q(Y, Z, W)
can be turned into a graph with three (unlabeled) nodes, one for X, Y, and Z, along with directed (hyper)edges between X and Y (labeled “R”), Z, X, and W (labeled “S”), and Y, Z, and W, (labeled “Q”). We also have the domain, which we can imagine as a graph whose nodes are all the terms within the finite domain connected by directed hyper-edges labeled by atomic relations. A solution to the query is then an edge-label-preserving homomorphism from the query graph onto the model graph. This may be unwieldy if done directly, but some version of this idea may be fine if we don’t directly encode the model graph in its entirety. This shows that the CSP problem is equivalent to the (colored-edge hyper)graph homomorphism problem.
Point-free relation algebra: This essentially would involve defining a small set of point-free combinators capable of combining into any query. There are point-free approaches to relation algebra made for the logic programming context, see, for example:

First-order unification using variable-free relational algebra
However, I think we’d be better off just creating combinators for composing functions with a single boolean output. This treats n-ary relations as n-ary functions that return a boolean. I think this can be done elegantly.

1 may be better for creating a small circuit, while 2 would probably be better for a programmer to use.

Is CSP right in the first place?

It seems to me that most of the time, we won’t be able to satisfy all intents. As a result, any CSP algorithm would just give a negative result. What we want to do is satisfy the maximum number of constraints.

This is the MaxCSP problem.

Robust satisfiability

Robust satisfiability is an attempt to characterize problems where, if a problem has an almost satisfying assignment, then such an assignment can always be found. Note that this doesn’t necessarily cover MaxCSP in its entirety; a MaxCSP problem could still be tractable in practice even if it isn’t robustly satisfiable. This would mean that some instances are easily approximable, but, in general, we couldn’t do better than guessing. This doesn’t say that we can’t efficiently maximize satisfied clauses either on average or efficiently with respect to some parameter other than problem size.

The most important paper is

Robustly Solvable Constraint Satisfaction Problems

This proves that any bounded width CSP is robustly satisfiable. Bounded width, in this case, means that the CSP cannot encode linear equations over abelian groups.

Bounded relational width

Further, according to

Linear programming, width-1 CSPs, and robust satisfaction

This paper

Some Optimal Inapproximability Results

proves that any efficiently robustly satisfiable CSP must have bounded width, though I can’t actually find this in the paper. Instead, the paper states something more specific; that linear equations over an Abelian group G where each equation contains precisely 3 variables cannot be efficiently robustly satisfied, and that first paper interprets that result more broadly.

Assuming all that’s correct, being of bounded width is the only condition that must be met to efficiently and robustly maximize assignments, and one cannot do better.

TODO: Other research? MaxSAT?

Important Note: Checking vs solving

I was given the task to formulate a logic for a tractable CSP in Vamp-IR or Juvix. It's worth noting that this isn't really necessary. Verification is done after a solution is found, and that's always doable efficiently, regardless of how difficult finding the solution was in the first place. A good example to keep in mind are diophantine equations (systems of polynomials with *integer* solutions). That domain is known to be undecidable, certainly harder than any NP-hard problem, but if you have a specific solution, then checking that solution is as simple as plugging it in and evaluating the system. What this means is that the language of intents does not need to match the language the verifier is using; the latter could be a simpler NP-complete language that we can embed the more complicated, restricted language in.

Descriptive Complexity

It does seem to me that the discussion thus far has been backward. We’ve looked at CSPs that are tractable, but why not ask, instead, for a logic which is simply complete for all tractable CSPs?

The immediate question is whether something like this is possible. This problem is “Gurevich’s Problem”, and is, in its most general form, still open.

Logic and the Challenge of Computer Science

It’s also been called “the most important problem of Finite Model Theory” and “the central open problem in descriptive complexity theory”.

However, there are many positive results.

In a totally ordered domain, two main logics are known to capture exactly, and only poly-time queries

First-order logic extended with a Least Fixed Point operator for defining new relations captures exactly the P-time executable queries.

Relational queries computable in polynomial time
The complexity of relational query languages
From that first paper

THEOREM 1. Let L be a query language consisting of relational calculus plus the least fixed point operator, ~. Suppose that L contains a relation symbol for a total ordering relation on the domain (e.g., lexicographic ordering). Then the queries expressible in L are exactly the queries computable in polynomial time.

It was proved that “the restriction of second-order logic to formulae whose first order part is a universal Horn formula” captures all P-time queries over ordered structures.

“The expressive power of second-order Horn logic” by Erich Grädel

This covers booleans as a special case. If we are willing to translate everything into a boolean domain, then these logics capture all the polytime queries expressible after said translation.

Also, note that least-fixed-point logic is essentially just bare Datalog.

An example of a poly-time query not expressible in this setting is the parity of an unordered set (determining if the number of elements in a set is even or odd).

The limitation is that an intrinsic ordering isn’t present on arbitrary graphs. If the graph is canonicalized, then this canonicalization implies an intrinsic ordering. The existence of a polynomial time graph canonization algorithm is still open; it is known to be doable in quasi-polynomial time. See;

The goal is to create a query language whose expressive power is independent of how the model graph is laid out in memory. The only major candidates are a logic called “Choiceless Polynomial Time”, and extensions of it. Specifically, the variant with the ability to count is the minimal, still possible candidate.

Choiceless Polynomial Time

CPT with counting is essentially the weakest logic such that all queries are in PTime and there are no known PTime queries that are not expressible in it.

There is much ongoing research trying to separate this logic from P. There are also attempts to minimally extend CPT so that its canonization invariance is more explicit. The most promising is

Choiceless Polynomial Time with Witnessed Symmetric Choice

OSL

This reminded me of a project that I previously worked on called OSL. GitHub - Polytopoi/osl: A spec language for zero knowledge proofs

It’s a kind of finite-domain second-order logic specifically designed for defining predicates for zk applications. I’ve been asked about it before, specifically in comparison to Vamp-IR. It really isn’t like Vamp-IR, but it might be more relevant here.

Category Theory

There is some work on formulating ideas within this domain in a generic, categorical framework. This descends from Abramsky’s work on Arboreal categories and the pebbling comonad.

Arboreal Categories: An Axiomatic Theory of Resources

Solution Methods

My default suggestion would be to use an out-of-the-box solver. Converting to a domain supported by Z3 and using that is not a bad choice. There are other options. One could instead use an ASP system like Clingo. There are, of course, other solvers. If we use MaxSAT or some kind of integer programming framework, such as Gurobi, that would imply different encodings.

TODO: Get a list of reasonable options; Dedicated SAT solvers, ASP systems, Integer Programming frameworks, Other constraint solvers?

@degregat wondered about overhead when encoding a CSP into a sat solver. There are simple naive encodings that would have linear overhead. Much effort spent on encodings deals with concerns like propagation efficiency; that’s both a difficult problem in general and can inflate the size of encodings. What we’re doing seems similar to the kinds of programs an ASP system would have; it may be worth looking into how a system like clingo translates things into its sat solver.

Problem Compression

It’s possible to transform a problem into a smaller problem from which we can extract a solution to the original problem. This is probably worth looking into;

Optimal polynomial-time compression for Boolean Max CSP

Encodings

TODO: Both SAT encodings and ILP encodings

ODESAT

@cwgoes mentioned an SAT solver I wrote and wondered about its applicability: GitHub - AHartNtkn/odesat: A sat solver that simulates ODEs to solve SAT problems.

In it, a CNF formula is translated into an ODE where every boolean variable is mapped to a continuous variable, and every clause is granted two variables, one representing a kind of short-term memory, and the other representing a kind of long-term memory. The precise way that the derivatives for these variables are defined guarantees that the vector field of boolean variables is point-dissipative; that is, any volume will always decrease in size. This prevents things like cycles or chaos; all trajectories will end up in a stable equilibrium well eventually. Additionally, these wells are exactly the solution to the original SAT problem.

This idea comes from the paper:

Efficient Solution of Boolean Satisfiability Problems with Digital MemComputing

There is also an integer programming version of the idea;

MemComputing Integer Linear Programming

According to the creators of the framework, this approach tends to scale more robustly irrespective of the problem structure in comparison to traditional SAT solvers. I can’t verify this, but I can verify that it’s at least effective as a technique. Conceptually, the technique seems like a continuous, fully parallel variant of Walk-SAT, but less naive than the obvious translation.

It can solve hard SAT problems, but it tends to be slower than traditional methods, at least due to the overhead of simulating a dynamical system. The current version of odesat doesn’t implement parallelism. I wanted to make a GPU version using Arrayfire, but I couldn’t get it installed so I gave up, but such would be necessary to get maximum efficiency. As it is, it’s likely 100x-1000x slower than it would otherwise be if I could run it on a GPU.

This approach may actually be well suited for max-sat as it can continuously try to maximize the number of satisfying clauses and it will keep trying to do better as long as it runs. My understanding is that it’s guaranteed to find the actual maximizing assignment eventually, but such problems don’t actually have any stable equilibria, so it would be part of a cycle of near-maximal assignments that it would eventually settle into.

degregat · August 11, 2023, 11:15am

@AHart That sounds like an amazing basis for further research and implementation!

Regarding embedding restricted languages in an NP-complete base language:
I assumed that the base Language we will write Predicates in will be turing complete, but we want to be able to specify restricted subsets of it.

It seems I conflated “subsets of a language” with “restricted DSLs embedded in that language, operating on a subset of its domain”, the latter being what we actually want.

If that is correct, I think we want these DSLs to have the following properties (incomplete list):

Which DSL an expression is written in should be explicit, s.t. a recognizer can check if the expression is in the language, e.g. a tag indicates. If we come up with one unified DSL for all interesting tractable problems, that’s great, but I assume we might want to specify different DSLs and actual subsets of the base language anyways. The latter could be useful to restrict the computational power at the base level by including or excluding operations.
Composition of (parsers for) expressions in different DSLs should be verifiably correct for security reasons. This holds for context-free languages (see Section 10, Principle 2 of Security Applications of Formal Language Theory), but I don’t know any formal results about PEGs. (This should also hold for the base language, serialisation, as well as any other parsers involved in the pipeline.)
In the worst case, we would need to verify that their representations are equivalent to e.g. S-Expressions.I hear there should be some results as well, but I didn’t find them yet.

Regarding implementing the logic/predicate for the tractable CSP: I assumed that VampIR is the closest we have right now for a “base language” for circuits, and a first prototype would be a an expression (from a hypothetical future restricted DSL) written in it.

What I don’t understand yet: If the restricted language is embedded in the language the verifier is using, does that not mean that the verifier is using the restricted language as well? In the sense that the restricted language would be something like a library function?

I agree with the areas of investigation you mapped out and am looking forward to work on them!

AHart · August 11, 2023, 9:20pm

To clarify what I mean, we may, hypothetically, choose our restricted language to be horn clause logic. This is a syntactic restriction on boolean logic that makes queries efficiently solvable. This syntactic restriction would be enforced in a higher-level language. When translating to, say, Vamp-IR, the library function in Vamp-IR could implement boolean constraints in general, which is an NP-complete problem. It doesn’t need to know what a horn clause is at all since it’s just a special case of a boolean constraint. In this example, the domain is the same. This is important because the final circuit does not need to know that the original query is a horn clause; that affects solvability, not semantics. Vamp-IR would be wasting resources verifying this. Parser verification would be on a higher layer, using formal verification rather than zk techniques.

The other example you gave me, (outside this thread) was systems of linear equations, which have the same domain as, say polynomial equations, but the latter is harder to solve. If you want to operate on a subset of the domain, I’d like to see an example. I suppose that this is what range checks do, but if you have something more elaborate in mind, that would be good to know.

Then again, maybe I’m misunderstanding something. My goal for the next week is to get a firm grasp on what intents are so that I can be more specific when designing an appropriate constraint language. If we need to, say, verify in a zk manner that an intent is within the DSL, then that is something Vamp-IR would need to do, but I was operating on the assumption that solving would be done outside the zk environment, and only the solutions would be verified by a circuit.

degregat · August 11, 2023, 10:37pm

Thank you for elaborating, I think I understand now what you mean

Regarding the example and restricted domains: I might have wrong terminology here or have been too unclear.

With “domain” in the above quote, I meant the domain that terms in this language can implement expressions over. I assume in the base language this domain includes all domains that can have expressions over them in an NP-complete language.

Example Restriction:
We can then restrict this to now only contain the domain of linear and polynomial equations, but not any other domain.
Another restriction would be to, e.g. the domain of graphs, and it should be derivable from the DSL what the domain is, in a safe and efficient way.

So what I meant was to distinguish between domains. Do I assume correctly, that the range checks you mentioned would be used to restrict one of these domains further, e.g. a subdomain of the domain of lienar equations? I don’t think we need this right now, but and example for a restriction of the graph domain would be to limit the size/amount of nodes a graph could have.

Ideally everything we do would be amenable to circuit encodings, because then we leave the door open to use zk or mpc techniques later on.
On the other hand, we might always have more options in the general and transparent case, so it also makes sense to try and quantify the difference in power we get in both settings.

It is fine to start here without trying to map everything to a circuit representation unless it’s obvious how to do that, or it turns out it’s the natural formulation. Once we figured out how to map an intent into some language, we can work on refining this. The order of operations you propose sounds correct to me!

Also I was not clear enough which domain in the hierarchy of domains I meant, so the misunderstanding was definitely a collaborative project, if it existed. I guess I should absorb some more category theory

AHart · August 11, 2023, 10:43pm

Ah, I assumed you meant the domain of variables (booleans, integers, etc.). I thought you were using “domain” in the sense of the CSP terminology. Range checks could, for example, restrict integers into a fixed interval of integers; that’s not what you were talking about. Or, maybe it’s a niche special case of what you’re talking about, which is more broad. Thank you for clarifying.

degregat · August 11, 2023, 11:03pm

Yes, I think range checks would be a refinement of a domain. This might be just a special case of a restriction to a “type” of domains, or it might be like dependent or refinement types are to a system that does not have them, but this sounds like a question for a type theorist, which I am not.

isheff · August 14, 2023, 4:18pm

Question: are CSPs expressive enough? They can only express predicates in NP, whereas @degregat mentioned wanting a turing-complete language of predicates, which would require more expression.

AHart · August 19, 2023, 4:40am

Here are a list of good candidates for solvers:

Dedicated Constraint Solvers:

Google OR-Tools

A promising set of tools targeting constraint optimization quite generically.

MiniZinc

A popular constraint-solving framework and format.

Gecode

A C++ toolkit for constraint-based systems.

SICStus Prolog

A prolog implementation with built-in constraint-solving capabilities.

Picat

A programming language with lots of built-in constraint-solving capabilities.

Yuck

A scala-based local search solver.

I found a lot of things other than these, but I think these, in the order I listed, are the most promising candidates.

I also found this paper:

Pure, Declarative, and Constructive Arithmetic Relations

which makes the claim that constraint programming over arithmetic domains is not needed, that, rather, one can get away with appropriately designed logic programs which make queries maximally decidable. The paper uses the monotone subset of Prolog for implementation, but something like a kanren implementation would work as well.

Dedicated ILP solvers:

GLPK (GNU Linear Programming Kit)

Historically popular, but somewhat aged toolkit

SCIP (Solving Constraint Integer Programs)

Well supported non-commercial solver

Coin-or branch and cut

Open source solver

lp_solve

If looking for specific SAT solvers, a good place to look is at modern SAT competition performance. Though, getting access to many of the solvers can be difficult as some are not publically available.

Dedicated SAT solvers:

Dedicated MaxSAT solvers:

There are also proprietary systems which might be hard to integrate. Gurobi and IBM ILOG CPLEX are the only major examples I found worth mentioning.

degregat · August 21, 2023, 9:26am

The turing complete language will most likely be the base language of the system. I assume we don’t want to use it directly to express constraints, instead restricted sublanguages with known properties seem more promising.

Regarding expressivity of CSPs in general: Are there any problems we might want to express which are not in NP? If so, I could imagine there being research into generalizing CSPs to encompass these as well and we should amend the proposal.

degregat · August 21, 2023, 9:35am

Do you think we want to operate in this domain all the time, or use this as an IR?

The circuit and finite field representations will be at this layer in any case, but it’s probably not going to be the most intuitive to work with.

For using it as an IR, En-/decoding for constraints would be defined together with the encodings of the objects mapped to field elements, right?

AHart · August 21, 2023, 10:58am

Do you think we want to operate in this domain all the time, or use this as an IR?

I’d been assuming we’d use it as an IR. That’s typically how SAT is used. As an IR, or as a side representation that we only translate into for the purposes of solving. Although, I don’t have a firm understanding of what problems our language actually needs to express; without that, I don’t know how far any given representation would be from the domain we care about.

For using it as an IR, En-/decoding for constraints would be defined together with the encodings of the objects mapped to field elements, right?

Well, presumably the encoding of objects would imply the constraint translations. What objects do we need to encode, and what are the original constraints over those objects?

degregat · August 21, 2023, 1:41pm

The simplest general example that comes to mind would be a

Multi-Agent, Multi Resource swap

Objects

Resources

Constraints

Can be formulated over:

Resource Type
Input or Output Resource
Ephemeral or Persistent Resource
Resource Value (this would, e.g. contain information about the owner of a resource)

Concrete Example

Agent 1 is willing to exchange up to 5 A and 3 B for at least 4 C.
Agent 2 is willing to exchange up to 4 C at least 4 A and 1 B.

Intent Agent 1

5 of Persistent Input Resource of Type A: Value declares Agent 1 as owner
3 of Persistent Input Resource of Type B: Value declares Agent 1 as owner
4 of Persistent Output Resource of Type C: Value declares Agent 1 as owner

Intent Agent 2

4 of Persistent Input Resource of Type C: Value declares Agent 2 as owner
4 of Persistent Output Resource of Type A: Value declares Agent 2 as owner
1 of Persistent Output Resource of Type B: Value declares Agent 2 as owner

Note: This should generalize to arbitrary numbers of resources and agents.

Questions

Can the “up to” and “at least” be implicit, or do we need to provide them as side information? I assume this will be implementation specific.
Should we see these two intents as individual CSPs that need to be composed for solving, or should we only see the solving batch, once it is aggregated, as the CSP to be evaluated?
How general is this (modulo errors/omissions in my description) multi-resource swap as a primitive. What else do we need if we want to be able to express intents over generalized future points of the state space implemented by persistent resources?

AHart · August 21, 2023, 1:57pm

What is the exact constraint problem? Would it be finding some kind of optimal exchange? In this case, would it be a trade that turns
1: 5A 3B | 2: 4C
into
1: 1A 2B 4C | 2: 4A 1B
(agent 1 gives agent 2 4A and 1B in exchange for 4C)?

Or, rather than finding this specific trade, is it verifying that some trade exists that can satisfy both intents? If the former, what are the optimization criteria? In my proposed trade it would be the trade that maximizes the number of satisfied intents while exchanging the fewest resources. Is it this simple?

degregat · August 21, 2023, 3:42pm

Yes, that is correct. What “optimal” means will likely be application dependent, but here
the natural formulation compatible with “up to” and “at least” is good.

The solver would only need to find potential (multi-party, multi-resource) swaps, execution (resource creation and consumptions) would happen after ordering.

Yes, I assume that, in the general setting where a lot of Agents formulate intents of this class, we would want to maximize the number of satisfied intents, subject to some utility or preference metric (e.g. by trust or payment for inclusion). This metric would probably be formulated as side constraints.

degregat · August 22, 2023, 10:48am

I just talked to @cwgoes to clarify some of the fuzzy bits:

State updates never happen during solving (from choosing the boundaries of the batch, to picking an assignment) and the term is mostly going to be used to store suggestions about solving strategies and side information about partial solving.
Intents should be formulated as a ptx, where if I have a 1A Input Resource and would like to get a 1B Output resource, I would create a data structure as follows (with simplified data structures in the service of clarity for this formulation):

let have = Resource {
  res_type: A,
  quantity: 1,
  value: owner = me,
  predicate: none necessary,
}

let want = Resource {
  res_type: B,
  quantity: 1,
  value: owner = me,
  predicate: value must also contain signature of previous owner specified in previous value, 
}

let ptx = Ptx { 
  input: have,
  output: want,
  term: not used during solving, mostly for side information and suggestions
}

Note: We can omit specifying input resources of the same type as the wanted output, since balance checking (only the same amounts of resources of a certain type can be created, as have beend consumed) always happens. Resources we create in an intent will get accepted by other agents using a certain validator set, once ordering and execution have been finalized and these validators have signed them.

AHart · August 22, 2023, 4:04pm

Based on my understanding, I was able to get a simplified prototype ILP solver running using OR-tools.

The core problem is made up of a list of constraints called “intents”. We can define each intent as a class that consists of

A weight that is a float
A list of “have” resources, along with integer amounts and valuations for each.
A list of “want” resources, along with integer amounts for each.
The resources themselves are simply represented as strings.

Each intent should give rise to the following;

A boolean indicating if the intent is satisfied.
For each “want” resource, an inflow is set to the amount for that resource if the intent is satisfied.
For each “have” resource, an outflow is constrained to be in between 0 and the amount for that resource, inclusive.

Additionally, within the overall problem, the sum of the outflows should equal the inflows, for each resource.

And, finally, the quantity that’s supposed to be maximized is the sum of the boolean variables for each intent times the weight for that intent.

Subtracted from this objective is a ratio between the valuation of the actual trade and the maximum trade valuation, scaled by half. This part’s a bit fuzzy, but the goal is to minimize the traded resources if the intent can still be satisfied.

from ortools.linear_solver import pywraplp

# Define available resources
resources = ['A', 'B', 'C']

# Define the intents.
# For have, the first value is the amount we're willing to trade and the second is how much the trader values that resource if its possible to avoid trading all of them.
intents = [
    {'weight': 1.5, 'have': {'A': (3, 2.0), 'B': (4, 1.0)}, 'want': {'C': 2}},
    {'weight': 2.0, 'have': {'C': (2, 1.0)}, 'want': {'A': 1, 'B': 3}},
]

# Create the solver
solver = pywraplp.Solver.CreateSolver('SCIP')

# Create variables and constraints for each intent
objective = solver.Objective()
x_vars = [] # List to store the binary variables for each intent
for intent in intents:
    x = solver.BoolVar('') # Binary variable for the intent
    x_vars.append(x)
    objective.SetCoefficient(x, intent['weight'])

    maximal_trade_expr = sum(value * amount for resource, (amount, value) in intent['have'].items())
    actual_trade_expr = 0
    
    # Outflow variables and constraints for each resource
    for resource, amount in intent['want'].items():
        inflow = solver.IntVar(0, amount, '')
        solver.Add(inflow == amount * x)
    
    # Outflow variables and constraints for each resource
    outflow_vars_for_intent = {} # Dictionary to store outflow variables for this intent
    for resource, (amount, value) in intent['have'].items():
        outflow = solver.IntVar(0, amount, '')
        solver.Add(outflow >= 0)
        solver.Add(outflow <= amount * x)
        outflow_vars_for_intent[resource] = outflow
        actual_trade_expr += outflow * value
    
    # Minimize traded items weighted by intent preferences
    ratio = solver.NumVar(0, 1, 'ratio')
    solver.Add(ratio == actual_trade_expr / maximal_trade_expr)
    objective.SetCoefficient(ratio, -0.5)
    
    intent['outflow_vars'] = outflow_vars_for_intent # Add to the intent dictionary

# Constraints for conservation of flow
for resource in resources:
    outflow_expr = sum(intent['want'].get(resource, 0) * x for intent, x in zip(intents, x_vars))
    inflow_expr = sum(intent['outflow_vars'][resource] for intent in intents if resource in intent['outflow_vars'])
    solver.Add(outflow_expr == inflow_expr)

# Set the objective
objective.SetMaximization()

# Solve the problem
status = solver.Solve()

# Print the results
if status == pywraplp.Solver.OPTIMAL:
    print('Objective value =', solver.Objective().Value())
    for i, intent in enumerate(intents):
        satisfied = x_vars[i].solution_value()
        print(f"Intent {i}: {'Satisfied' if satisfied else 'Not satisfied'}")
        if satisfied:
            print("Outflows:")
            for resource, outflow in intent['outflow_vars'].items():
                print(f"  {resource}: {outflow.solution_value()}")
else:
    print('The problem does not have an optimal solution.')

Output:

Objective value = 2.75
Intent 0: Satisfied
Outflows:
  A: 1.0
  B: 3.0
Intent 1: Satisfied
Outflows:
  C: 2.0

Writing this made me think of some additional questions. The way this is done, it flows resources arbitrarily to try maximizing the objective; it doesn’t consider individual trades. This leads to some nice behavior. Take, for example;

Agent 1 offers up to 6 X for 3 Y

Agent 2 offers up to 2 Y for 4 X
Agent 3 offers up to 1 Y for 2 X

If we operate on a two-party trade system, there’s no way to satisfy any intents as no one has enough to satisfy 1, but if we only care about resource flows then we can satisfy all intents by transferring 2 Y from 1 to 2, 1 Y from 1 to 3, 4X from 2 to 1 and 2X from 3 to 1. We can imagine agents 2 and 3 pooling their resources to trade with 1, but this is implicit and 2 and 3 have no formal relationship in the solution.

This is good, but it leads to a free rider problem. Let’s say we have

Agent 1 offers up to 2 X for at least 1 Y

Agent 2 offers up to 1 Y for at least 1 X
Agent 3 offers nothing for 1 X

We can maximize satisfied intents by giving one of 1s Xs to 3, but 3 isn’t contributing anything. I can imagine that filtering out intents like agent 3s is something that should be done before the constraint solver is invoked. I can also imagine that there should be some more complicated utility function that gives agent 3’s trade a negative weight, but I’m not sure what that would be. Each intent would presumably have a weight computed prior to the constraint solver; this may be the problem of whatever computes that weight.

One possible thing to look at is how useful the offered resources are to satisfy intents. Agent 1s offer would have a weight of 2 since it being satisfied contributes to 2 other intents being satisfied, but the other two intents would have weights of 1 and 0 respectively. This formulation seems exploitable, but maybe something along these lines would work. But I’m not sure if I should be worrying about this problem, or if we should consider discrete, two-party trades.

Topic		Replies	Views
Draft Proposal: Intents as Constraint Satisfaction Problems, implemented by (Partial) Transactions Comparative Architecture	2	948	September 25, 2023
Constraint linearity in the resource machine Resource Machine Stuff	42	181	July 17, 2024
Intents with data dependencies Comparative Architecture	5	438	October 4, 2023
Demystifying intents Education	2	267	April 9, 2024
Abstract intent machine model Protocol Design	7	313	November 27, 2023