Private state synchronization
In this post I list and briefly review some of the existing solutions to the private state synchronization problem. The list is non-exhaustive, though.
Edits: (06/12/24) expanded problem statement; described aztec proposal; fixed typos.
What is this problem anyways?
Motivation: Zcash
In Zcash, roughly speaking, the sender encrypts notes under the encryption public key of the recipient of an asymmetric encryption scheme. The scheme is key-private, which ensures no information about the recipient’s encryption public key is leaked in the ciphertext. In other words, the encrypted note that appears on-chain is anonymous.
The curent approach for recipients to find out which (encrypted) notes are addressed to them is to download and try to decrypt all new notes appearing in the blockchain since his last sync. This is very inefficient.
What about Anoma?
In anoma the situation is similar. Users in order to find out their new resources with encrypted data (the private state) need to sync and try to decrypt the set of all settled transactions since their last sync.
A convenient abstraction
This problem can be abstracted as follows. We have a bulleting board \mathsf{BB} where user S (a sender) posts a message addressed to another user R (a receiver). The identity of the receiver is not explicit in the message; namely the receiver is anonymous.
- Detection problem: How can R find out efficiently which messages are addressed to them?
- Retrieval problem: How can R retrieve efficiently the messages addressed to them?
Privacy. in this context means that in the process of doing so, R does not want to disclose the location/content of the messages addressed to them.
Efficiency." The complexity of the recipient work is sublinear in the total number of messages contained in the bulletin board. Another criteria for efficiency is communication bandwidth. Thus, R cannot trial-decrypt or cannot do a full scan of \mathsf{BB} (or none of them).
Note. The detection problem can be used to solve the retrieval problem. But it is not straighforward. If R wishes to maintain their privacy, they can’t just publicly download those specific messages from \mathsf{BB}.
Trivial solution
Scanning and trial-decryption is delegated to a powerful third party TP. The obvious problem being that the identity of R is revealed if TP is dishonest.
Ad-hoc solutions
See the full list in this blogpost.
Zcash light client
Explained in ZIP-307. A third party retrieves transactions from the ledger, and compacts them before forwarding all of them to the recipient (wallet).
This reduces communication bandwidth, but wallets still must try to decrypt all compacted transactions.
In-device smart scanning
These refer to algorithms to find messages addressed to the recipient earlier. For example, exploiting the recipient’s transaction graph. Check this post for more details.
Pros:
- Deployed in wallets, so privacy is preserved.
- Improve user experience since some funds are found earlier.
Cons: Bad performance. Ultimately, to retrieve all funds a full scan is needed.
Using trusted enclaves
Check this paper as one example, and also this other one.
Fuzzy message detection (FMD)
Sits in between smart scanning and outsourcing to a TP. Check the paper here.
The outosurced scanning will return, along with transaction addressed to the recipient, a number of false positives (i.e. messages not addressed to the recipient.) The false positive expressed as a ratio parameter selected by the recipient.
Cons: There is a trade-off between privacy and complexity. To achieve full privacy, the recipient would need to scan the whole bulletin board.
Private information retrieval (PIR)
PIR allows a user to query an index of a database without revealing anything about the queried index to the server. A proposal in the aztec network suggests using batch PIR to reduce communication bandwidth.
High level idea
Messages in \mathsf{BB} are augmented with a short digest of a secret shared between the sender S and the receiver R (i.e. the result of a DH exchange). Also, an external untrusted server TP holds a copy of \mathsf{BB} in the form of a database.
Then R would:
- Download the digest of all messages from \mathsf{BB}
- Find out which contain their shared secrets. In other words, detect which messages are for them.
- Do batch-PIR queries to an external server to retrieve the rest of the data on their messages.
Cons. The recipient R still need to trial-find out which digests are addressed to them (step 2) before can they can PIR-query the database from TP.
The underlying PIR scheme. The proposal suggest to use the batch PIR described in this paper. The batch aspect uses known techniques in the PIR literature. The novel part being in the base PIR scheme used. The database is encoded as an hypercube, and the user encodes its queries as vectors, one coordinate per dimension of the cube. The queries are encrypted with an RLWE-based scheme. The reason being because it supports homomorphic rotation of the plaintext vectors. Homomorphic rotations (and additions and multiplications) are leveraged on the server side to return the data for the queried locations.
Oblivious Message Retrieval (OMR)
Introduced in this paper (see also this Zcash thread). The recipient R outsources the retrieval to an untrusted third party TP.
High level idea
The protocol has a setup phase, and a detection phase. Also, the sender S attaches extra information to the messages appended to the bulleting board \mathsf{BB}.
During setup, the recipient R generates a keypair (pk_R,sk_R) of an FHE encryption scheme with plaintext space \mathbb{Z}_2. Each R publish their public key pk_R.
Later, when the sender S creates the message m_i for R it also generates \ell ciphertexts, all encrypting 1 under pk_R:
Last, when R wants to sync, it sends pk_R to the third party. TP has access to all messages in \mathsf{BB}.
- TP re-encrypts each ciphertext vector \mathbf{c}_i (embedded in m_i) with pk_R. If m_i is not addressed to R, then the re-encrypted ciphertexts \mathbf{c}'_i encrypt garbage, i.e. a random bit each. This means that the AND of \mathbf{c}'_i encrypt 0 with high prob. (i.e. with prob. 1-2^\ell). Otherwise, if m_i is addressed to R, then the ANDed ciphertext c_i'':=\bigoplus_{j\leq\ell} c'_{i,j} encrypts 1.
- TP randomly arranges these N ANDed ciphertexts c''_i into k buckets, where k is a parameter chosen by R. The probability that a bucket contains more than one c''_i addressed to R is low (depends on N and k). Provided this event does not happen, XORing i\cdot c''_i in each bucket results in a ciphertext that encrypts i if m_i is addressed to R, or 0 otherwise.
- TP sends the k XORed ciphertexts to R. (To detect decoding failures, i.e. the event above, several bucket arrangement are created and sent to R.)
This shows how R can detect which messages are addressed to him. R can also retrieve the the messages themselves leveraging again the homomorphic properties of the encryption scheme.
Followup work: Group OMR.
Private signaling
Introduced here. The sender S constructs a signal c_i to attach it to a message m_i of the bulletin board. The signal is such that only the intended recipient R can identify it is for them, and it must be constucted without prior communication S \leftrightarrow R.
High level idea.
Two approaches presented.
Using trusted enclaves. The sender would first post their message m_i to the bulletin board, and then send the recipient’s identity and the location of the message encrypted under the public key of TEE hosted in the untrusted server TP. The TEE keeps internally a table storing (recipients,locations indexes):
so that recipient R_i can later query it. This approach is not really practical due to the resource constraints of TEEs, among other things.
MPC approach. The idea is to secret-share the table \mathbb{T} across different untrusted servers (e.g. across TP_1 and TP_2). The sender would send shares of the recipient R and of the location \mathsf{locd} to each TP_j. Now the servers run an MPC protocol to compute the following function on (secret-shared) inputs \mathbb{T},R,\mathsf{loc}:
- Update row of \mathbb{T} for R with \mathsf{loc}
- Randomize all cells in \mathbb{T}
The last step is needed to avoid leaking which row (recipient) was updated to the untrusted servers TP_j.
Later, R can query each of the servers the shares of their row, and reconstruct them to get the location indexes.
Follow-up works: Scalabe private signaling.
Inspired by anonymous messaging
Can techniques used for anonymous messaging be re-used in the context of private scanning?