Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring coverage for various anonymization schemes #17

Open
yoid2000 opened this issue Oct 26, 2018 · 0 comments
Open

Measuring coverage for various anonymization schemes #17

yoid2000 opened this issue Oct 26, 2018 · 0 comments
Assignees

Comments

@yoid2000
Copy link
Contributor

yoid2000 commented Oct 26, 2018

Coverage ranges from 0 to 1. It is computed as the average coverage over all columns. We know what all the columns are from the raw data.

If a given column does not even exist in the anonymized dataset (i.e. it was removed by the anonymization), then the coverage value for that column is 0.

If a given continuous column exists in the anonymized dataset, but there is no way to make range queries over it, then the coverage value for that column is 0. If range queries can be made over the column, then the coverage value is 1. Note that for Aircloak and raw datasets range queries can be made. We may have to establish different tests for this for different anonymization schemes.

For enumerative columns that exist in the anonymized dataset, we compute coverage the same as we already do (the ratio of the number of distinct column values in the anonymized dataset over the number of distinct column values with more than one user from the raw dataset).

Note that for some differential privacy anonymization schemes, you simply won't be able to make additional queries at some point. When this happens, any remaining unqueried columns will have a coverage value of 0. (I'll make a new issue for this when we have such an anonymization scheme in place.)

yoid2000 added a commit that referenced this issue Jan 10, 2019
yoid2000 added a commit that referenced this issue Jan 14, 2019
Still needs testing and cleanup, but basic thing seems to be working
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants