Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement separate procedure to "share with consortium" #21

Open
kvantricht opened this issue Oct 30, 2024 · 13 comments
Open

Implement separate procedure to "share with consortium" #21

kvantricht opened this issue Oct 30, 2024 · 13 comments
Assignees

Comments

@kvantricht
Copy link

Next to option to initiate "make public" procedure of user datasets, we need a separate and simpler approach to share the dataset with the consortium so we can use it in training.

@santoshkaranam
Copy link
Collaborator

@jdegerickx
Copy link

@cbutsko, @kvantricht
Open question: do we need the quality control (and hence dataset confidence label) before we can actually use the dataset?

@jdegerickx
Copy link

jdegerickx commented Nov 13, 2024

to be discussed: do we need the license and link for each dataset that we use for training (in order to publish the products later on)?
Solution: we can suggest a license ourselves and use the name + organization of the user to correctly refer to the dataset in any downstream publication

@cbutsko
Copy link

cbutsko commented Nov 14, 2024

While we are building the proper sharing pipeline, here are two options for temporary solution:

  1. Just adding a checkbox that triggers an automatic notification that is sent to our emails. Then, we can actually have a look at the dataset and decide whether we want to use it .
    image

  2. After the dataset is successfully uploaded as a private one, user gets a pop-up window (see example on the picture below). There, we briefly describe the meaning of sharing with consortium and underline the fact that it's not gonna be publicly available. We can add a simple checkbox about potential willingness of sharing data with the public. This should also result in an email notification to us (developers), and we can handle it manually on a per dataset basis at first.
    image

When the proper sharing procedure is settled, all datasets that have been shared in a simpler way will also need to acquire the necessary attribute, or they won't be used for training.

@jdegerickx
Copy link

to add to the pop-up message:
You will be accredited in official WorldCereal publications for your contributions.

@santoshkaranam
Copy link
Collaborator

@jdegerickx can be change this message a bit to inform about citation as well.
"You will be accredited in official WorldCereal publications for your contributions" ---> "You will be accredited with your login username and institute, in official WorldCereal publications for this contribution." ?

@jdegerickx
Copy link

@santoshkaranam, good question. Personally I would avoid being too specific at this point because we still might change the procedure. If they already provided a preferred way of citation through the metadata, we will off course use that instead of username and institute...

@santoshkaranam
Copy link
Collaborator

@jdegerickx @kvantricht should we make it the default selection with checkbox checked initially? If the user does not want, then the user has to uncheck, else all uploaded datasets are shared with the consortium (privately) by default.

If we decide to make it default option, then this checkbox has to be included in the very first step where the user uploads the dataset file.

@jdegerickx
Copy link

@santoshkaranam, could you briefly explain again what happens in the background when a user wants to share the data with the consortium?
The dataset needs to be stored elsewhere? (community -> consortium store?)
Or can you just edit the user privileges and grant the worldcereal-extractions account read access to the dataset, while keeping it in the community store?
Just trying to figure out why the user should indicate this before the upload...

@santoshkaranam
Copy link
Collaborator

@jdegerickx
The Share to consortium will be just a flag, not need to move it to anywhere.

Having default check is to get as many user datasets to consortium as possible. The flag can be set any time in the upload workflow.

If it is at the first step, the user will be informed before user uploads.

@jdegerickx
Copy link

I would keep it as Christina suggested just at the end of the workflow.
We can always inform the user in the introduction before the upload...

@santoshkaranam
Copy link
Collaborator

OK Sure, so this will be explicit selection by the user.

@jdegerickx
Copy link

what happens when user selects to share with consortium:

  • dataset is marked as "restricted" and "unverified" --> worldcereal moderator account gets access to the dataset and gets notified by mail
  • worldcereal moderator performs a quick check and marks the data as verified OR refuses the dataset
  • moderator should have option to edit dataset metadata
  • as soon as dataset is approved by moderator, metadata should no longer be editable
  • dataset is marked as "restricted" and "verified" --> worldcereal user account gets access to dataset in "user collections"
  • worldcereal user gets notified by mail when the dataset is made available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants