Shorter FMD public keys - is it possible?

In the FMD1 and FMD2 constructions of Fuzzy Message Detection, to achieve a minimum false-positive rate p=2^{-\gamma}, the number of ElGamal public keys is \gamma.

If for example we fix \gamma = 16 and ElGamal encryption is instantiated over any popular 256-bit ellitpic curve, then the size of the FMD public key is 1KB. Larger choices of \gamma yield even larger public keys.

Question: How can we reduce the size of the public keys while maintaining the same minimum false-positive rate?

This will improve the efficiency of the FMD test run inside the TEEs in the FMD-TEE proposal, and would make generation of ZKPs for flag consistency less expensive.

1 Like

A (partial) solution that minimizes communication bandwidth is to derive the \gamma public keys from a master public key. Only the master public key needs to be communicated with senders.

For the FMD1 or FMD2 schemes , the public keys are elliptic curve points. A BIP32-like derivation can then be done.

  • The key owner (receiver) initially does \gamma derivations parent private key → child private key to generate the \gamma kepypairs (sk_i,pk_i)_{i\leq\gamma} from a master secret key sk_m.
  • The key owner transmits (makes public) the master public key pk:=sk_m\cdot G
  • The sender does \gamma derivations parent public key → child public key to recover the \gamma public keys (pk_i)_{i\leq\gamma} from the master public key pk_m.

Above G denotes a fixed generator of the cyclic group where the FMD scheme is instantiated. It will hold pk_i = sk_i\cdot G for all i\leq\gamma.

This solution only partially answers the question because it does not improve the efficiency of the FMD test.

1 Like

The above suggestion is actually insecure.

Edited in 24/01/2025

  • Updated the fix
  • Notation in general to better describe the fix.

The problem

tl;dr: In BIP32, normal child key derivation allows to recover the parent private key using the child private key and the parent public key.

If we are to derive \gamma FMD keypairs (s_i,P_i := s_i\cdot G) from the master keypair (x,X:=x\cdot G), then anyone with, say, only the first secret key s_1 (seen as a detection key) can recover the remaining secret keys s_2,\ldots,s_\gamma using s_1 and the master public key X (by recovering the master private key x first).

For illustration, let’s look at a simplified version of BIP32 using a cryptographic hash function \mathsf{H}:\mathbb{G}\times\mathbb{I}\rightarrow\mathbb{Z}_q. Here \mathbb{G} is the group where public keys live; it has order q and G is a fixed generator, and \mathbb{I} is a domain for a running counter i. (To derive a maximum of \gamma:=|\mathbb{I}| FMD keys.).

The simplified key derivation does as follows:

  1. h_i := \mathsf{H}(X,i)
  2. s_i := x+h_i\mod q
  3. P_i:=s_i\cdot G

Esentially, this is BIP32: hash public-key material and do a modular addition of the digest and the parent private key.

Public derivation of the public key P_i is possible because P_i= X+h_i\cdot G and h_i is publicly computed. But, it also holds

x = s_i-h_i\mod q.

So. anyone with s_i can recover the master secret key x.

This is not a problem in many contexts because usually the child secret keys are not be disclosed. But in the FMD context, the child private keys s_i are disclosed (will be part of the detection keys).

The fix

Let’s start using two master keypairs (x_1,X_1),(x_2,X_2). The second keypair for blinding. To derive child keys, hash X_1,X_2,i onto h_i and set s_i=x_1+x_2+h_i. Again, public key derivation is possible because P_i=X_1+X_2+h_i\cdot G. But now the knowledge of s_i only allows to recover x_1+x_2, which reveals no information on x_1 or x_2. Instead need two child keys s_i,s_j to fully determine x_1 and x_2.

In general, for fixed parameter m, we start with m master keypairs (x_1,X_1),\ldots (x_m,X_m). Because we want public derivation of the child public keys P_i from X_k we are forced to derive the child secret keys s_i as public linear combinations of the master secret keys x_k. There is no way around to this.

Whatever public linear combination we choose (coefficients a_{i,k},h_i below), the knowledge of n child secret keys s_i establish a linear system of n equations for the m unknown master secret keys x_k:

\begin{aligned} &s_1 = a_{1,1}x_1+\cdots + a_{1,m} x_m+h_1\\ &\qquad\qquad\qquad \vdots \\ &s_{n} = a_{n,1}x_1+\cdots + a_{n,m} x_m+h_n \end{aligned}

If n<m, the system is undetermined, and there are infinitely many possibilities for the master keys x_1,\ldots,x_m. But if n\geq m we can find a solution. This imposes the following threshold assumption:

:warning: Assumption: If there are m master keys, then no more than t:=m-1 child keys are compromised.

The suggested key derivation.[Updated on 14/03/2025 to address Yulia’s comment.] We start with m master keypairs and use two cryptographic hash functions \mathsf{H}:\mathbb{G}^m\times\mathbb{I}\rightarrow\mathbb{Z}_q, \mathsf{G}:\mathbb{G}\times\mathbb{I}\rightarrow\mathbb{Z}_q. Derive the i-th child keypair as:

  1. h_i:=\mathsf{H}(X_1,…,X_m,i), a_{i,k}:= \mathsf{G}(X_k,i) for 1\leq k \leq m
  2. s_i:=h_i+\sum_{k\leq m}a_{i,k}x_k
  3. P_i:=s_i\cdot G

Public derivation of public key P_i is done setting P'_i:=h_i\cdot G+\sum_{k\leq m}a_{i,k}\cdot X_k. Clearly P_i=P_i'.

Public derivation and diversification

The scheme above is based in BIP32 and does not allow to diversify FMD public keys.

Here we discuss a different way to do key derivation that accounts for diversification.

Desiderata

We want to achieve the following two features.

  • Compactness: Derive an FMD public key pk_f from a compressed public key pk_c. Derivation must be public and it must hold size(pk_c)< size(pk_f). Only compressed keys pk_c are part of payment addresses.
  • Diversification: A user controlling m addresses wants to have m unlinkable FMD public keys pk_{a,f}, one per address a. All public keys pk_{a,f} being associated with the same FMD secret key sk_f (i.e. same sk_f for all user’s addresses).

Description

The system parameters are an integer t\geq 2, and group \mathbb{G} of order q (the domain of public keys).

High level idea. The idea is to use as a master key a secret polynomial \tilde{p}(X)\in\mathbb{Z}_q[X] of degree t. The corresponding compressed public key is the polynomial in “the exponent”, thus a vector of t+1 points in \mathbb{G}. To derive arbitrarily many secrets, evaluate \tilde{p}(X) at \gamma>t different public evaluation points x_1,\ldots,x_\gamma. Unlinkability is achieved by randomizing the compressed public key.

The different keys are defined next.

  • Master secret key An at most t-degree polynomial \tilde{p}(X)=\sum_{k=0}^t c_kX^k in \mathbb{Z}_q[X]. Thus:
sk_m:=(c_0,\ldots,c_t)\in\mathbb{Z}_q^t

Each coefficient c_i is sampled randomly or derived from other secret-key material controlled by the user.

  • Master public key. The coefficients of \tilde{p}(X) encoded in the exponent of a fixed basepoint G\in\mathbb{G}. Thus
pk_m = (C_0,\ldots,C_t)\in\mathbb{G}^{t+1}\text{ where } C_i = [c_i]G.
  • FMD secret key The FMD secret key sk_f for a batch of addresses a_1,\ldots,a_m (controlled by the user that knows sk_m) are \gamma evaluations of the secret polynomial \tilde{p}(X) at public points x_1,\ldots,x_\gamma. Thus:
sk_{f} := (\tilde{p}(x_1),\ldots,\tilde{p}(x_\gamma))\in\mathbb{Z}_q^t
  • Compressed FMD public key For address a, the compressed public key pk_{a,c} is a randomization of the master public key pk_m. Choose randomizer r_a\in\mathbb{Z}_q, and define address basepoint G_a = [r_a]G, and C_{a,i}:=[r_a]C_i, then
pk_{a,c}:=(G_a,C_{a,0},\ldots,C_{a,t})\in\mathbb{G}^{t+2}.

Note that pk_{a,c}= (G_a,[c_0]G_a,\ldots,[c_t]G_a). Thus, it encodes the secret polynomial \tilde{p}(X) using address basepoint G_a.

  • FMD public key. The uncompressed FMD public key for address a are the evaluations of \tilde{p}(X) “in the exponent” using basepoint G_a. Namely if P_{a,i} = [\tilde{p}(x_i)]G_a, then
pk_{a,f} := (P_{a,1},\ldots,P_{a,\gamma})\in\mathbb{G}^{\gamma}

Public derivation

The uncompressed FMD public key pk_{a,f} is obtained evaluating the encoded polynomial in pk_{a,c} at public points x_1,\ldots,x_\gamma. Concretely:

P_{a,i} = \sum_{k=0}^t [x_i^k]C_{a,k}

for all i\leq\gamma.

Diversification

All FMD public keys pk_{a,f} derived from the same master secret key sk_m:=\tilde{p}(X) are associated with the same FMD secret key sk_f := (s_1,\ldots,s_\gamma), where s_i = \tilde{p}(x_i). This is because, as noted, each compressed public key pk_{a,c} encodes \tilde{p}(X) using basepoint G_a.

Batch-diversification

It may be desirable to not have all the set of m addresses controlled by the same FMD secret keys, but yet controlled by the same master secret/public keys. This protects against potential leakage of sk_f.

We can achieve this by evaluating pk_m at distinct set of public evaluation points (x_1,\ldots,x_\gamma), (x'_1,\ldots,x'_\gamma). Each batch would yield different FMD secret keys sk_f := (\tilde{p}(x_1),\ldots,\tilde{p}(x_\gamma)), sk'_f := (\tilde{p}(x'_1),\ldots,\tilde{p}(x'_\gamma)).

Dynamic choice of \gamma

The maximum false-positive detection rate 2^{-\gamma} can be chosen by the sender at the moment of sending the message – i.e. at the moment of creating the FMD flag ciphertexts. This is because it can evaluate the compressed public key pk_c at arbitrarily many public evaluation points x_i.

Flow diagram

pk_m = (C_0,\ldots,C_t) \stackrel{\stackrel{\text{ compressed public keys}}{\text{randomize}}}{\longrightarrow} \begin{cases} & \mathsf{addr_1}: pk_{1,c}:=(G_1,C_{1,0},\ldots,C_{1,t}) \\ & \vdots\\ & \mathsf{addr_m}: pk_{m,c}:=(G_m,C_{m,0},\ldots,C_{m,t}) \end{cases}
\stackrel{\stackrel{\text{ uncompressed FMD public keys}}{\text{evaluate in the exponent}}}{\longrightarrow} \begin{cases} & \mathsf{addr_1}: pk_{f,1}:=(P_{1,1},\ldots,P_{1,\gamma_1}) \\ & \vdots\\ & \mathsf{addr_m}: pk_{f,m}:=(P_{m,1},\ldots,P_{m,\gamma_m}) \end{cases} \text{ possibly } \gamma_i\neq\gamma_j

Unlinkability

If the DDH problem is hard in group \mathbb{G}, then the master public key pk_m and any set of m compressed public keys pk_{a,c}:=([r_a]G,[r_ac_0]G,\ldots,[r_ac_t]G) look unrelated.

Let addresses a ranging 1\ldots m. By the DDH assumption, we have that for any coefficient c_i it holds

(G,[c_i]G,[r_1]G,[r_1c_i]G,\ldots,[r_m]G,[r_mc_i]G)\approx (G,[c_i]G,[r_1]G,R_1,\ldots,[r_m]G,R_m),

for random points R_j.

Tradeoffs: key-recovery threshold and address size

:warning: Note: This is is a threshold scheme. If more than t FMD secret keys are leaked, then the master secret key can be recovered.

For example, if t=2, then addresses only need to have 3 points in \mathbb{G} for the compressed public key pk_{a,c}. However, if 3 FMD secret keys s_{i_1},s_{i_2},s_{i_3} are leaked from sk_f, then can recover all the other FMD secret keys s_{i_j} for 1\leq j\leq \gamma (with Lagrange interpolation can recover the 2-degree secret polynomial \tilde{p}(X)).

In general, with t+2 points in addresses need to leak at least t+1 FMD secret keys. Esentially, we are using Shamir secret sharing to generate each secret key s_i, so any set of t or less s_i leak no information whatsover about the secret polynomial \tilde{p}(X).

So larger t yields better security but larger addresses.

Impact in FMD2 implementation

The modifications of the FMD2 scheme are minimal. The only modification is that the basepoint is not common to all users/addresses: when flagging and testing for address a, the base point G_a (included in pk_{a,c}) must be used.

I’m probably missing something, but if we use public key \sum{X_i} (ignoring the rest of the components here) and it allows to recover \sum{x_i}, isn’t it equivalent to having a single public key X = \sum{X_i} and recovering x (because the key shares are always used together, so we don’t actually need to know individual shares)? Or are we using the private/public keys in different combinations and not always together?

Good catch! You are right, recovering the sum \sum_{k\leq m} x_k allows to recover all child secret keys s_i:= h_i +\sum_ {k\leq m}x_k for i = 1,\ldots n.

I think the problem is that the same linear combination is used in all s_i. Thus, a_{i,k}=1 for all i = 1,\ldots, n, and k=1,\ldots ,m. What about using different linear combinations for each s_i?

For hash \mathsf{G}:\mathbb{G}\times\mathbb{I}\rightarrow \mathbb{Z}_q, the key derivation would be:

  1. h_i:=\mathsf{H}(X_1,…,X_m,i), a_{i,k}:= \mathsf{G}(X_k,i) for 1\leq k \leq m
  2. s_i:=h_i+\sum_{k\leq m}a_{i,k}x_k
  3. P_i:=s_i\cdot G

Public derivation of public key P_i is still possible. But now, knowing s_i gives away the i-th linear combination \sum_{k\leq m}a_{i,k}x_k, and it is not clear (to me) how to recover other linear combinations \sum_{k\leq m}a_{j,k}x_k for j\neq i. The obvious way would be to recover the m master keys x_i by solving the linear system, but for that need at least m child keys s_i to establish m linear equations.

Not sure if this answer your question: In my mind, the child key s_i will be the FMD detection key given to the i-th TEE. So, the idea is that at least m TEEs must leak s_i to be able to recover all s_1,\ldots s_\gamma (here \gamma = n), i.e. to recover the FMD private key which allows to test with the lowest false-positive rate.

1 Like