Private solving using MPC

This is an initial draft of research proposal of private solving using MPC, which contains a summary of my understanding of the setup together with the potential directions I consider. @cwgoes, @vveiln, @apriori, @degregat I seek your feedback to correct/widen my understanding if it is mistaken or incomplete and to settle down the problem formulation. Please let me know if the scope seems to be trivial or infeasible. Questions are italicized to prevent being missed. I understand that answering some of them could be already a part of this project - I just want to be sure that they are to-be-answered in that case.

@cwgoes - Based on the feedback I get for this, I might write a better-organized version for Mary -and potentially add more context for some parts-

I start with a quick list of two possible research directions since I will refer to them while introducing the setup. Details on them will follow.

  • Generic MPC: Compatible with any algorithm. Provides flexibility, comes with practicality concerns.
  • Specialized MPC: MPC protocols tailored for specific problems. Limited/no flexibility on algorithm choice, advantageous for efficiency.

What is intent?
As far as I understand from @degregat, intents are implemented as imbalanced transactions. After solving is completed, new transaction needs to be computed anyway. So, I think I can consider intents as an abstract entity, which will be encoded in whatever way is convenient for MPC. Intents essentially contain want and have fields which include the kind and quantity. There are also different representations of intents in kudos-snippets github repo, I am not sure how general the intents I consider should be. Considering that end-goal is a PoC implementation for Kudos, it might be best to catch up with what is the requirement there.
My suggestion to scope intents for this project would be: intents contain have.kind, want.kind, rate, max_quantity, where rate is the desired amount of want.kind resource for 1 have.kind resource and max_quantity is the maximum amount to trade have.kind resource. Any solving strategy that trades have.kind and want.kind for the desired rate up to max_quantity is valid.
I guess this view of intent is already practically useful and it is restricted enough to leverage cycle detection/flow maximization kind of algorithms. This is especially useful if we will investigate specialized MPC protocols rather than generic frameworks. What do you think about this scoping of intents? There seem to be some weights in Kudos Juvix specification, should I consider weights too? If so, I need to understand how weights are expected to work. Who is a source of up-to-date information on this?

What is solving?

In this context, I use the term solving to refer to the process of:

  1. Collecting intents from the users and matching a suitable subset of them: I see two questions here. First one is the algorithm picked for matching, while the second one is running this (or an) algorithm without revealing intents of the users. More details below.
  2. Creating a (possibly shielded) transaction: To my understanding, this is what Taiga handles. Does this imply that we have to stick with generic MPC constructions at least for transaction creation? Can Juvix be used to convert Taiga code to circuits? In that case, we might be able to use some of existing frameworks on Taiga without too much modification.

How to match (without the question of privacy)?
Answer to this question heavily depends on the definition of intents and valid match. I’ll state some generic discussion.
I have found topic Intent Languages: CSP Formulation where CSP formulation of intent solving is discussed and a solution based on integer programming is adapted. I think the discussion there is useful but the generality of CSPs seems to be an obstacle to coming up with a tailored, efficient solution. Investigating whether certain CSP classes are known to solve with MPC can be an option. Here, I am not sure if decision or maximization (checking vs. solving discussion) version is more relevant. With MPC, we can either require some subset of participants to be matched (which can be all participants, too) or all participants to be matched. We need optimization version for the former, but both optimization and decision versions work for the latter.

The problem can be also viewed as a flow maximization problem. If we assume that there are too many kinds of resources to be matched (sounds reasonable for Kudos), an initial step to match intents could be matching want-have kinds and postponing the numerical matching. For this, cycle detection algorithms can be useful (which seems to be a separately studied problem in MPC framework). If we expect to have a small set for possible kinds of resources that an intent can contain, then this might hurt the privacy, in addition to being unnecessary.

How to solve privately:
Some candidate approaches for private solving are discussed in this report and blog post. For this project, I will focus on the MPC approach. With MPC, we aim to prevent leakage of users’ intents. In this setting, users jointly play the role of solver without sending their intents explicitly. I see two potential scenarios to consider:

  • MPC participants only consist of the users. Users jointly match the intents and settle the transaction.
  • There is an additional MPC participant, who is responsible for handling heavy computations while users are mostly responsible for exchanging easy-to-compute messages. This might be useful to consider if an MPC protocol for the above item requires too much computational resources for users. (I don’t even know if this kind of construction exists.)

As a counter-argument to employing the second approach, there seems to be a recent MPC paper (with open-source implementation) that is -claimed-to-be- scalable and optimized for lightweight participants. Checking this specific work in detail may be a good start to the project.

Matching in MPC could mean both matching intents of all the participants (i.e. decide whether participating users could all match, and then create the transaction if so) and matching intents of a subset of the participants. I believe both are valuable and I consider focusing on the one that seems more promising on the way. Does this sound okay? Either of them can be favorable based on topology assumptions (which I believe we don’t have).

As @vveiln already pointed out, there are some obstacles that we may need to keep in mind:

  • Since users are not unconditionally trusted, a user can attack privacy of another user by crafting intents. This is proposed as a reason to consider more than 2 parties in the computation. Isn’t it also possible that all of n-1 other users engaging in joint solving are malicious and colluding for any n? It might be possible to argue that for some big enough n, this is as unlikely as we can accept. What is the reasonable definition of big enough n here? Is considering 3-party computation worth the privacy gain in exchange for lost efficiency? I guess answering this question requires one input from network side -the rate of increase in privacy with n- and another one from MPC side -the rate of efficiency decay with n-.
  • A follow-up question to ask is “what should be minimum number of intents that are matched to settle a transaction”. Even though many parties participate in the solving protocol, similar attacks could be possible if it is possible to match just 2 intents.

Do we have some answers/preference on the answers to these obstacles or should they be kept as side-questions in the project?

Existing PoC implementations for solving research:

  • kudos-snippets github repo: Stand-alone (i.e. no integration with Taiga etc./transparent case only) demonstration of the general idea of intents and solving using a CSP solver.
  • mpc-solver-approx github repo: Stand-alone (i.e. no integration with Taiga etc./transparent case only) demonstration of a basic solving strategy using MPC framework MP-SPDZ.

Scope proposal:
As far as I understand, implementation of solving with MPC for some specified algorithm is not the big question. Although there is no standard MPC framework yet (apparently NIST is engaging in some standardization process soon - but definitely not the solution for us in near future), there are available frameworks that can run any algorithm that we specify with their language. MP-SPDZ (github, paper) is the one @vveiln used for PoC already and awesome-mpc has an extensive list of frameworks that we can check. The main problem seems like the practicality of implementation.
Potential directions to focus on here are:
1 - Conducting literature research to list generic frameworks/MPC protocols with advantages and disadvantages. Benchmarking a chosen solving algorithm on different MPC protocols using a comprehensive framework like MP-SPDZ. This can include sketching trade-offs between different efficiency measures like number of rounds, size of messages, total computation time, and local resources used.
Potential (rough) agenda for this: Collecting advantages/disadvantages of existing theoretical/implemented generic constructions/frameworks. Picking a framework for benchmarking and learning how to use it. Running experiments for benchmarking and reporting them.
Struggle point: Whatever generic MPC framework we use, it will not be useful if it is not compatible with Taiga. Is the correct question to ask here finding a framework that is compatible with Taiga or finding a good protocol and implementing it in a way compatible with Taiga (I don’t expect this to be something I can achieve within the scope of this project)?
2 - Trying to optimize MPC and solving algorithms together. This approach includes iterating over different solving strategies by checking specialized MPC protocols and/or trying to develop nice solving strategies using the problems that can be efficiently solved with MPC. Potential problem here: What about after matching is done? Although we specifically handle the matching part in MPC, do we have to go back to some generic MPC for transaction creation?
Potential (rough) agenda for this: Work on the protocol+algorithm design. PoC implementation for the chosen design. Possibly choosing a few designs and experimenting with the tradeoffs. Reporting.

Which one of these two directions is good/better to pick? Or am I completely off-track?

1 Like

This work on collaborative zk-SNARKs may be relevant for the sub-question of how to produce proofs in MPC which are compatible with Taiga.

Highlights from Saturday meeting:
We discussed separating problem into two subproblems. The first one is matching intents privately (MPC) and the second one is creating transactions (hopefully Taiga). Main problem here is deciding what MPC should output so that second part could be executed. Another potential direction is checking what useful functionalities for the second part is doable with MPC (like collaborative zk-SNARKs).
TODO for myself: add diagrams to visualize data flow.

1 Like

I think there are a few versions of the problem that we discussed, based on how the transaction is created. For all of them, matching intents is done via MPC.

  1. Having a solver to run the transaction creation:


    With the questions:

  2. Running MPC for transaction creation:
    a. Running Taiga circuit:


    With questions:

    b. Running required functionality separately inside MPC: (only the blue part differs compared to (a))

    With questions:

1 Like

The helium paper sounds like a good starting point!

It seems to me that the question behind the risks you see in approach 1 and 2 is the following:
Does it seem likely that it will be fundamentally impossible to convert between shares of circuit representations of two different MPC schemes? If so, how does it look for schemes in the “same class” (e.g. based on the same or similar primitives).

This would be an interesting question to answer in general, and would probably tell you more about wether approach 1. or 2. seems most viable.

I think this would be a sufficient scope for this project on its own, even if there is no time remaining to implement the matching or TX computing parts afterwards.

I found a book on composition of SMPC protocols.

edit: PDF link

1 Like

After the last week’s meeting, I have realized that the focus should be on the second phase of solving intents privately (computing the transaction).

As a summary, private solving involves two main phases:

  1. Matching the intents
  2. Computing the transaction

while keeping the content of intents private. We also aim to keep the work on users side minimal.

Matching the intents phase should run some sort of searching algorithm and should end with deciding which users’ intents will be matched (and for which quantities, if applicable). I am not sure what should be exact output of this phase and I think this can be decided based on the results we will obtain for the second phase.

Computing the transaction phase (Taiga) involves (operations requiring private information):

  • Creation of new resources: This should involve encryption of the resource plaintext and committing the new resource. I am still confused about the way this encryption is defined in Taiga specs - it looks to me that solver is able to decrypt later on since it knows the private key, which should be a violation for privacy @vveiln. A subquestion for private solving is defining the encryption here properly.
  • Nullifying input resources
  • Computing compliance proof record (as a zkSNARK) which requires private inputs:
    – Input and output resource plaintexts
    – Openings of commitments for input and output resources
  • Computing resource logic proof record (as a zkSNARK) which requires private inputs:
    – Input and output resource plaintexts
  • Computing binding signature for transaction delta which requires private inputs:
    –Input and output resource quantities and kinds
  • Computing extra data (application dependent, potentially private computation)

Approach 1
We can investigate whether running a non-private implementation of private solving as a whole in MPC is possible. For example,

  • Silph: A Framework for Scalable and Accurate Generation of Hybrid MPC Protocols investigates a framework for compiling a program written in a high-level language to an MPC protocol, which is a secure mix of multiple MPC primitives. Downside is their PoC is for C++.
  • If we can convert the whole implementation to a single circuit, we can employ frameworks like Helium -MHE based-. Advantage of Helium is it fits nicely to our setup where
    – users want to keep their input private to themselves,
    – users don’t want to compute a lot,
    – solver can handle heavy parts of the computation.
    The paper Blind Evaluation Framework for Fully Homomorphic Encryption and Privacy-Preserving Machine Learning addresses the problem of decrypting FHE ciphertext for each round due to execution of programming logic. Binary circuits, which they constructed for some programming logic operations, can be useful for our problem, too.

Disadvantages:

  • Too ambitious: this might not be a feasible approach in practice
  • Requires focusing on all parts at once - can be complicated

Advantages:

  • A generic solution is needed for application-specific, custom private operations or each such application development will involve question of private computations
  • Work on the algorithm design will be (at least mostly) decoupled from the cryptographic concerns.
  • Updates on Taiga will not imply any updates on cryptography for privacy.

Approach 2
We can investigate an MPC solution for each part separately and ways of composing them.

  • Composition question involves several aspects:
    – Which information should be contained in each subproblem’s input/output. Secret sharing the private information at the beginning and then operating on them can be option. Letting each user have their own private meaningful data for each subproblem MPC can be another option.
    – The data types of inputs/outputs for different MPC protocols. As @degregat pointed out, converting inputs/outputs for different MPC’s might not be practically possible if they employ incompatible fields (like same field with different parameters or different fields that we don’t know any correspondence).
    – Security: Composing different MPC protocols may hurt privacy although they each provide privacy. Seems like what @cwgoes referred to focuses on this aspect (and universal composability seems to be enough):

We should keep security aspect in mind even for approach 1 since running same MPC protocol several times with different inputs might enable some privacy attacks.

  • We can investigate approach 1 solutions for some of subproblems.

Disadvantages:

  • Still needs to investigate approach 1 for application-specific, custom private operations or application development with additional private data will still involve question of private computations
  • Updates on the Taiga may require re-investigation and replacement of some parts
  • Algorithm design and encodings of inputs/outputs for each part is coupled with cryptographic concerns.
  • Solutions for each smaller problem is not independent. We still have the concern of composing them.

Advantages:

  • Smaller problems to attack:
    – Less things to focus at once
    – Possibility of finding tailored solutions → practical solution

Plan proposal
I think we want to follow approach 2. For this, I propose starting with resource creation part since the way we choose to handle privacy there will potentially impact how we will manipulate private inputs for the proof creations/signature generations.

  1. Decide how resource plaintexts will be encrypted. Further discuss the constraints/requirements/options with @vveiln
  2. How will we execute the commitment?
  3. PoC implementation

After this part is done, I can proceed with investigating creation of one of the proof records (no preference on which one).

TODO - add diagrams for data flow: @cwgoes I am actually not sure what should be included in the diagrams. One problem I have is that I am still not sure about how users should involve in the MPC - and I now think this is a question to answer on-the-go.

2 Likes

The current Taiga spec is defined for a non-MPC/non-private-solving case. In that case, the solver is the one who encrypts the payload, so it isn’t a big deal that they can decrypt it - they saw the plaintext anyway

Plus the custom private inputs we don’t control

Keep in mind it has to be both ZKP- and MPC-friendly because we verify encryption in ZKP circuits

So, we should replace this encryption with a new scheme based on objectives:

  • Solver can efficiently compute it without learning the content (MPC friendliness?)
  • User can later decrypt; solver cannot
  • Should be verifiable (question at the bottom of this post)

They are application dependent custom private inputs, aren’t they?

Do you refer to compliance circuits here or should there be a separate verification for the encryption?

I would see it as adaptation, not replacement, but if we absolutely cannot adapt it, then yes

Yes

I think we do it in resource logics, not the compliance circuit

Still, this shouldn’t mean that we need verifiable encryption since we deal with creating proof record for the compliance circuit, should this?

I’m not sure I’m following, but the need for verifiable encryption and compliance circuit proof creation are orthogonal questions really. In a transaction we create proofs for both the compliance circuit and resource logic circuits

Sorry, it is my bad. I wanted to write resource logic proof record here instead of proof record for the compliance circuit.

Some quick notes on encryption experiments:
I tried running Hydra (using existing implementation) and Poseidon based encryption (I implemented) where:

  • using Mascot protocol (which is malicious majority) of MP-SPDZ library (i.e. a Python implementation),
  • using Pallas base field (256 bit prime),
  • (precomputed) secret key (including nonce) and plaintext consisting of 10 field elements is shared among 2 players.

(For a fixed randomness, I confirmed that existing Taiga encryption implementation gives the same result as my Poseidon based implementation.)

Notable results:

  • The whole (local execution, no network delays) protocol for Poseidon takes around 3 seconds while Hydra takes around 0.5 second. These measurements are not taken in an appropriate way (i.e. by running only the experiment and repeating sufficient amount of times) but I believe it still proves that Hydra outperforms Poseidon, which is also confirmed by:
  • Poseidon based requires ~1981 rounds and ~86MB data in total while Hydra one requires ~321 rounds with ~17MB data in total . Since I took an existing implementation for Hydra, but implemented Poseidon myself, I might have missed some optimizations. However, a significant difference makes sense because:
    • Only nonlinear layer in both constructions is the S-Box, which is in the form x**alpha. (communication is required for nonlinear computations on shared values or opening of shared values)
    • Hydra picks alpha as 3. The version of Poseidon that Taiga currently uses picks alpha as 5.

Extra notes on optimization:

  • Hydra uses a trick to efficiently (for some aspects) compute cubes. Their trick enables them to move the multiplication of shared values mostly to offline phase. As far as I understand from the paper, this also enables batching, which reduces the number of rounds. I think, a similar trick can be used for Poseidon but I don’t think we can hope for less amount of computation/data transmitted in total.

Future intentions:

  • I think a natural next step is comparing their performances for proof generations. Based on how much (and which) one outperform the other for proof generation, it might be wise to consider changing the choice of hash function.
2 Likes