Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Related work #19

Open
shankari opened this issue Mar 5, 2020 · 4 comments
Open

Related work #19

shankari opened this issue Mar 5, 2020 · 4 comments

Comments

@shankari
Copy link
Contributor

shankari commented Mar 5, 2020

Here, we will list related, commercially available projects and how they are different from UPC.
So far, we have:

  • gmail, etc: horizontally scalable service
  • AWS, etc: provide virtual hardware in the cloud
  • heroku, digital ocean, etc: provide horizontally scalable cloud services but without privacy
  • encrypted cloud-based data stores: store and share data, no compute
  • user cloud services e.g. nextcloud: WebDAV based storage + admin chosen "apps" on top. Still multi-tenant, admin has access to everything
  • nextcloud + encryption: maps to encrypted cloud-based data storage (as far as I can make out. I should really enable this on my instance and see what I find).

Key differentiators:

  • individual user control
  • combine storage and compute (more 'active' than encrypted file storage)
  • aggregated queries across instances
@njriasan
Copy link
Contributor

Here are some additional similarities I see:

Password managers/password vaults:

  • Password vaults rely on a stored vault containing all the passwords for a users' various accounts. We shouldn't have a vault of passwords but our vault is probably based on around services and permissions. If we want to enforce permissions at all we need to organize our data by type and then derive keys to access each type of data. Our vault can therefore both be thought of as our key generation/selection process and also the manifest indicated allowed services and permissions. This could also mean rather than deriving keys we have a more traditional password vault model where there are many keys (which would we split on datatype and/or device) and then the user manifest indicates which keys can be given.
  • Password managers either provide their vaults on device or in the cloud. Its probably most sensible to store a copy of the manifest in the could so users can sync devices and require the master key to push any changes. While local validation seems reasonable its probably feasible to have downloading the latest manifest be the first step. Similarly if we were to add two-factor authentication it would probably make the most sense to place this burden for changing the manifest (similar to how password managers add 2-factor for using the master key).
  • Another similarity is that password managers struggle with completely scrubbing secrets (https://www.ise.io/casestudies/password-manager-hacking/). We have implicitly considered these threats that need to be mitigated in future work but its worth noting that similar techniques would probably benefit us.

Personal Data Stores (Solid, Hub of All Things, OpenPDS):

  • These are very similar to what we are suggesting where users have the single source of data.
  • Proposals seem to all require your own server (which of course we could extend for more security) and seem to just control data access.
  • Lacks encryption and opt seem to require new apps for usage (we would also require new apps but these would just be docker images for us).

Personal Clouds (Cozy Cloud, NextCloud, MyCloud, Freedom Box):

  • Similar to personal data stores except they also restrict data to just their own services.
  • Often are either separate specialized hardware or owning and operating your own server (which we could view as an extension for greater protection).
  • Biggest difference is that it seems like they have a restricted service ecosystem whereas we want to provide a mechanism to add new services to the ecosystem

Platforms as a Service (Heroku, Google App Engine):

  • Strongest similarity is that it allows for the creation of your own software stack but all services are generally subject to platform restrictions. This seems very similar to what we are requesting for users except with a different set of rules (especially differential privacy for global aggregators) and encryption throughout.

If This Then That:

  • Applets perhaps most aptly map to what we are suggesting for providing services to integrate devices.
  • They seem to rely on drivers for uploading data streams and unfortunately we may as well.
  • One big issue is that a lot of their applets are broken so this suggests it might be worth given some thought as to how reviewing services need to change to try prevent broken and/or rule violating services.

@shankari
Copy link
Contributor Author

Stuart Macmillan pointed me to
https://flip.it/qP_HNR
which labels itself as "GitHub for Data".

My take on it is that it was primarily focused on publishing synthetic data, similar to prior work on publishing cellphone location data -e.g.

Mir, Darakhshan J., Sibren Isaacman, Ramon Caceres, Margaret Martonosi, and Rebecca N. Wright. 2013. “DP-WHERE: Differentially Private Modeling of Human Mobility.” In 2013 IEEE International Conference on Big Data, 580–88. Silicon Valley, CA, USA: IEEE. https://doi.org/10.1109/BigData.2013.6691626.

Jack looked into that for his MS thesis, and what we found was that in order to get the DP privacy budget right, the data is typically really coarse. This is fine-ish if you are talking about cellular data, which is already coarse to begin with, but not that great if you want to explore street-level data from fine-grained GPS traces.

They also still assume that the datasets are controlled by developers at big organizations, instead of users controlling their own data.

@shankari
Copy link
Contributor Author

How about enigma?
https://enigma.co/

They published a cryptographic solution for contact tracing
https://blog.enigma.co/safetrace-privacy-preserving-contact-tracing-for-covid-19-c5ae8e1afa93

@shankari
Copy link
Contributor Author

Also, for the record, openmined (https://www.openmined.org/) does DP learning. Note directly related but good to put into related work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants