This post will summarize the broad research on composable CSP solving I’ve done so far. There are a lot of specific and quite technical topics to cover, so I can’t be thorough, but my goal is to give an overall impression of the relevant subject matter and point to some decent sources that could be used for future, more detailed research. I don’t feel like this is finished, so I’ll continue updating this over the next week or more.
This is the way @cwgoes described the decomposition framework to me,
- An agent would get a segment of the CSP, which it would then “simplify” into a smaller CSP. This smaller CSP would then be sent (along with other small CSPs from other sources) to another solver. This leads to a tree of solvers which aggregate solutions to create a solution to the original problem.
I don’t know how much we can vary from this because I do not have the context necessary to know where this setup comes from. If it can’t be varied, then some of the things I will talk about are less relevant or even useless to us. If this can be varied, then some things are more applicable.
The most directly relevant methods are so-called “preprocessing” techniques. These are methods that can be used to reduce the size of a problem. These fall into semantics-preserving transformations and satisfiability-preserving transformations. The former can be applied at any time without limitation, though they are unlikely to be able to reduce a typical problem to any significant degree. Satisfiability-preserving transformations are more likely to be applicable but have some limitations.
To illustrate the nature and limitations of these, I’ll use blocked clause elimination as an example. Technical details aren’t so important, just know that it’s a method to detect when a clause can be removed from a SAT problem in conjunctive normal form while preserving satisfiability.
There are two important things to note when it comes to applying these sorts of techniques to a problem split up between different agents.
What does the agent know? In general, one cannot detect blocked clauses without access to the full problem. However, clauses are always blocked by a specific variable, so if an agent has control over all the clauses that a variable appears in (common in the distributed CSPs I’ll talk about in a later section) then the agent can at least detect if the variable it has control over blocks any of its clauses. If a clause is blocked, this information needs to be communicated with any other agent that shares those clauses. How we break up the problem has a substantial effect on what valid transformations are detectable.
Generally, formula preprocessing only preserves satisfiability (or optimizability), not semantics. This means a solution to the simplified problem can be used to construct a solution to the original problem, but they do not share solutions. If a blocked clause is removed, this information needs to be kept around. It’s not helpful to other solvers; rather, it would be sent to an aggregator that takes the solution the solvers found along with a trace of preprocessing steps in order to calculate a solution to the original problem.
Since we are specifically interested in optimization, there are additional concerns related to cost-preservation after transformation. Naively applying transformations will not generally preserve costs, but there are relatively simple steps one can take to ensure these are preserved.
Preprocessing is a broad field, and methods can vary substantially depending on the domain. The preprocessing chapter of “Handbook of Satisfiability” is a good source for ordinary SAT solving.
Our problem most closely resembles mixed integer linear programming. Methods in this domain are substantially less developed, but some exist, such as those in
Proper simplifications are rare in general. In order for the scheme described to me to be practical, the simplified constraint problem should always be substantially smaller than the input. This really can’t be done without simply committing to a specific choice, thus limiting optimization potential. This would likely imply that earlier solvers would lock in values to optimize over an expectation of the variables those agents don’t control. This matches what some incomplete algorithms for distributed CSP do.
Decomposing CSPs is a highly nontrivial task. Not because splitting the CSPs themselves is hard; what’s hard is splitting them up in a way that solutions to the smaller problems can be recombined. For SAT solving, the most common method to split up a problem picks a variable and substitutes it as true and false in two copies of the original problem. These two copies are then simplified and sent off to two separate solvers. This eliminates a variable, but, generally, the two subproblems are approximately the same size as the original problem. As such, the amount of total work done to solve the two subproblems will be ~2x the work needed to solve the original problem. The reason it’s most popular is that it’s the only method that works for any problem and always provides some benefit in the presence of parallelism. A similar procedure can be done for any (set of) variables within a finite domain CSP. Other methods can offer more advantages but come with some limitations.
Here’s an example decomposition technique from sat solving.
One can identify a vertex separator of the problem graph which splits the problem into two problems only connected by the fact that they share the variables in the separator. We then fix the values of all the variables in the separator. Assuming there were n variables in the separator, we now have 2^n sub-problems, each, roughly, half the size of the original problem. This means the new (total) work will be ~2^n * 2^(m/2), where m was the size of the original problem. Compared to the original work of ~2^m, if n is small, this could be a considerable asymptotic efficiency boost. However, random SAT problems can’t generally be efficiently decomposed like this since their minimal vertex separator will have roughly the same number of variables as m/2, meaning we’d be back to 2^m work and wouldn’t gain any advantage. Also, finding minimum vertex separators is, itself, an NP-hard problem, but decent approximation algorithms do exist. We’d have to do a survey of real-world problems we’re interested in to see if this will generally be effective in practice.
It’s also worth noting that this transformation makes the problem much larger, but, potentially, easier to solve. If we want the problem to become smaller, this generally won’t do. Alternatively, we could just split the problem in 2 and coordinate a search for values between two agents. This would involve trying to find solutions to the sub-problems, then, if one agent succeeds, the valid variable assignment it found will be sent to the other agent to constrain its activity. If this subsequent search fails, both agents will negate that variable and continue searching. This approach is less parallel and isn’t very friendly to incomplete algorithms, but it will actually decompose the problem. However, the number of messages exchanged between the agents will be exponential with respect to the size of the vertex separator. In the limit, this will be something like synchronized search in distributed CSP.
These sorts of tradeoffs tend to be fairly generic unless the problem has some kind of special structure.
The problem is that decomposition is itself an NP-hard problem, meaning finding simpler subproblems whose solutions can be efficiently combined will generally be as difficult as simply solving the original problem in the first place. However, if a problem has sufficient structure, then smarter decompositions are possible.
There is a generic notion of “CSP decomposition”. The method involves transforming the graph of a problem into a tree. Generally, this is done by clustering variables in the original problem into nodes. If there are variables X, Y, and Z with domains D, E, and F, then their clustered variable would have domain D x E x F. There would be further constraints stating that these cluster variables must match on components that came from shared variables. CSP decomposition specifically focuses on incomplete decomposition methods which are efficient to compute. There are lots of ways to do this, but it seems that “hypertree decomposition” is the dominating method.
This is done because acyclic CSP problems are always tractable as you can find a satisfying assignment without ever needing to backtrack. This section of this lecture gives a good description of the procedure;
We may apply something like this to our problem; it would allow us to place node sub-problems into agents, and we’d need a more sophisticated method to solve the tree-shaped meta-problem to get a full solution. It’s also worth noting that we have no guarantee for the maximum size of the node sub-problems.
If we move to a more structured domain, namely ILP, there are more and more useful decomposition methods.
There are a few common methods. Generally, these focus on finding essentially independent dimensions of the problem that can be used to split it into smaller subproblems. There will be “linking” variables, whose constraints are shared over essentially all the subproblems, and there are “block” variables that can be solved in relative isolation. Due to the linking variables, these techniques don’t actually make the problem asymptotically smaller, rather, the problem is split into easier-to-solve subproblems which are, in the worst case, about the same size as the original problem. They also generally only apply to problems with the right structure. However, if a problem has only a small number of linking variables, the sub-problems could be substantially smaller than the original problem. We’ll have to survey examples that we’re interested in to get to see if we’d get an advantage.
Distributed CSP solving
The idea is that we have numerous agents working in parallel, each responsible for a finite segment of the CSP problem (generally, a small set of variables or a single variable). They repeatedly use knowledge of their local environment to search for solutions and continuously send messages to locally relevant agents in order to ensure a cooperative maximum.
The setup I described before, where there is only one-way communication through a single message that hands off problems from one agent to another in a hierarchy, is manifestly incompatible with almost all approaches to DisCSP. The relevance of DisCSP is dependent on whether the paradigm of solving, as described to me, can be abandoned essentially in its entirety for a more connected network with more communication overhead. If we look at some incomplete approaches, such as Communication-Free Learning, it is compatible with this setup, but such algorithms are rare and considerably less studied than algorithms with two-way message passing.
It’s also worth noting that there’s not generally going to be an algorithmic advantage to distribution. However, for certain kinds of problems, there are heuristic reasons to think distributed solving might work well. From chapter 20 of “Handbook of Constraint Programming”
In general, distributed techniques work well only when the problem is sparse and loose, i.e. each variable has constraints with only a few other variables, and for each constraint, there are many value combinations that satisfy it. When problems are dense and tight, usually the distributed solution requires so much information exchange among the agents that it would be much more efficient to communicate the problem to a central solver.
This does seem to describe our problem.
Generally, we will have to trade-off between completeness, message size, and message number. All complete algorithms (ones guaranteeing an optimum) are either exponential in the number of messages, the size of messages, or both. This would seem to limit us to incomplete algorithms, some of which have linear or constant message size and number.
Relevant sources for details are
For incomplete algorithms, we could use something as simple as hill-climbing (essentially gradient descent), possibly augmented with a stochastic search of some kind. In practice, more complex algorithms are used, see section 4.3.5 in that first source. If we choose to go down this route, I’ll write a separate follow-up article comparing our options with respect to how compatible it is with our domain.
Based on the setup described to me by @cwgoes, we need a way to “simplify” a CSP so that it can be passed onto another solver. One question that should be asked is if the solution to a different CSP can even help with the current CSP. If so, passing the simplified CSP will accelerate later solving. The question of leveraging solutions to produce faster solutions within CSP is called the promise CSP problem. Unfortunately, the field is simply too primitive to give concrete results. Only a small number of families of PCSPs have been classified, none quite matching our domain.
Propagators are a model for designing distributed algorithms. They are similar to Petry nets, except, instead of transferring tokens, lattice elements are instead transferred. Each node stores a lattice element, and when a new element is received, the meet of the two elements is taken and stored. The idea is that each meet is an update on new information. The lattices could be subsets of some collection of elements (the powerset lattice), or intervals that could be made more specific with updates, etc. According to some sources, common CSP and SAT-solving algorithms are special cases of this, but I can’t find a clear account of how.
Section “6.3 Dependencies for Implicit Search” may offer a basic description of a propagator network for resolution-based SAT solving which might be adapted with other methods.
is an attempt at creating a modern propagator library.
As for what lattices we could use, the field of abstract interpretation offers a lot of work on which lattices are useful for tracking the flow of information through programs.