Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation split #8

Open
arbelhizmi opened this issue Dec 17, 2024 · 1 comment
Open

validation split #8

arbelhizmi opened this issue Dec 17, 2024 · 1 comment

Comments

@arbelhizmi
Copy link

In main_dino.py file, you're doing validation_split - why do you need this? in self supervised DINO training there are no labels, so how it's possible to do validation? and if there is no use for validation, why are you splitting the dataset into train and validation?
Also, what are the self.target_transform in the Multichannel_dataset dataset (I understand this attribute is built-in, but why do you use this when you train in self supervised manner)?

@pfaendler
Copy link
Collaborator

We did this so that we can always use a subset of the data, that wasn't part of the training, for downstream analyses (e.g. plotting a umap to see how cells cluster). We think that this is, even though the label is not used in the self-supervised setting, cleaner, i.e. it allows us to investigate how images of cells that were never seen by the model before, are then handled or 'viewed' by the model afterwards.
And regarding the self.target_transform, we tried to implement the custom dataset as usually done in a pytorch setting and so we kept it. Additionally, we, if we have label information, use the label in the downstream analyses after self supervised training and then this allows us to use the same dataloader for that.
I hope that clarified things!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants