Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designing and understanding the privacy properties of our algorithm #24

Open
MxmUrw opened this issue Jan 23, 2023 · 1 comment
Open
Labels

Comments

@MxmUrw
Copy link
Contributor

MxmUrw commented Jan 23, 2023

There are two approaches for generating noise:

1. Sample from the "float" gaussian distribution

2. Sample from the discrete gaussian distribution

Papers:

  1. Deep Learning with Differential Privacy. here
  2. The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. here

Questions:

  • Our distribution is not defined on all integers, but only on a subset. (as we need to prohibit overflow in the finite field that encodes our fixed point numbers) (Understand how being in a finite field instead of on the integers affects the distribution #22)
  • Paper 2 applies randomized rounding when converting the gradient vector into fixed-point representation. It seems like this is not done because of privacy guarantees. Simply rounding towards 0 should be enough for privacy, because this way the norm stays below 1, so the argument that the overall function is 1-sensitive applies. But is this true?
@ooovi
Copy link
Contributor

ooovi commented Jan 31, 2023

this explains why approach 2. (Sample from the discrete gaussian distribution) results in a DP mechanism.

for all c clients:
   1. compute gradient and clip to L2-norm 1
   2. round towards 0 componentwise to get n-bit fixedpoint in (-1, 1)
   3. project to integers in (0, 2^n) using p(x) = 2^(n-1)*x + 2^(n-1) componentwise
4. sum all gradients

5. add discrete gaussian (0, sigma²) noise componentwise

6. modulo to obtain a field element 
7. project to float using pp(y) = 2^(1-n) * y - c

privacy

steps 1.-4. describe a function to the integers. as for each input gradient $\lVert x\rVert_2 \leq 1$ because of the clipping in step 1., exchanging any one clipped input gradient $x$ by any other $x'$ (and hence, replacing any one entry in any client's data set by a different entry) will change the result of the computation up to step 4. by at most

$$\left\lVert \sum_{y \in X} \bar p(y) - \sum_{y \in X'} \bar p(y)\right\rVert_2 = ||\bar p(x) - \bar p(x')||_2 \leq 2^{n-1} \cdot(||x||_2 + ||x'||_2) \leq 2^n$$

where $\bar p$ is the componentwise application of the projection p to integers described in the code, $X$ and $X'$ are the input gradient sets with $x$ and $x'$ exchanged and the other inputs the same. hence the sensitivity of steps 1.-4. is $2^n$. for any $\epsilon>0$ and $\sigma = \frac{2^n}{\epsilon}$, step 5. makes the whole procedure $\frac{1}{2} \epsilon^2$ zero-concentrated differentially private (https://arxiv.org/pdf/2004.00010.pdf theorem 14 with all $\sigma_j = \sigma$). hence to obtain $\rho$-zCDP, we should choose $\sigma = \frac{2^n}{\sqrt{2\rho}}$.

steps 6.-7. don't affect privacy due to post processing invariance of zCDP (https://arxiv.org/pdf/1605.02065.pdf lemma 1.8).

we need to make sure we're doing the rounding as described in step 2 (#28).

also, this is the privacy guarantee for one training step only. zCDP composes nicely (https://arxiv.org/pdf/1605.02065.pdf lemma 2.3), so performing k iterations of the above algorithm using the same input data sets for the clients will yield a k * rho zCDP algorithm.

federation

steps 1.-3. are done locally with the clients. the resulting integers are secret shared among the aggregators, which verify the clipping was done correctly using our vdaf. they perform steps 4.-7. on the ciphertext in a distributed manner, except for the noising in step 5., which is done by each aggregator using plaintext noise known only to that aggregator. each aggregator has to add the full amount of noise, because they are assumed to be honest but curious and could reconstruct a less-noised version of the result if the other aggregator did not add all the noise necessary for the DP guarantee. the field used for this part of the computation is chosen large enough to avoid modulo wraparound for the sum computation, which allows us to compute the sensitivity as if it were a function on the integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants