Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrected order of valid and train data arguments in DomainDataModule #82

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

NicolasKuske
Copy link
Collaborator

here the order of validation and training where mixed up. This error goes all the way through to the trainer. I.e., training data and validation data get swapped!

Maybe adapt the code with keyword arguments so that the order does not matter?

PS:

If you want to double check the length of valid and train data used in the trainer, add in class DomainDataModule:

def get_train_length(self):
        return len(self.train_dataset)
def get_val_length(self):
        return len(self.val_dataset)

Then add before the unimodal module the following class:

from lightning.pytorch.callbacks import Callback


class DatasetLengthLogger(Callback):
    def on_train_end(self, trainer, pl_module):
        train_loader = trainer.datamodule.train_dataloader()
        val_loader = trainer.datamodule.val_dataloader()
        
        train_length = len(train_loader.dataset)
        val_length = len(val_loader.dataset)
        
        print(f"Training Data Length: {train_length}")
        print(f"Validation Data Length: {val_length}")

Finally, in def train module, before calling the trainer, add dataset_length_logger = DatasetLengthLogger()

And in trainer, add dataset_length_logger to the callbacks:

callbacks=[
            ModelCheckpoint(
                dirpath="checkpoints",
                filename=module_name,
                monitor="val_loss",
                mode="min",
                save_top_k=1,
            ),
            dataset_length_logger
        ],

Result:
Training Data Length 128
Validation Data Length 256

here the order of validation and training where mixed up. This error goes all the way through to the trainer. I.e., training data and validation data get swapped!

Maybe adopt the code with keyword arguments so that the order does not matter?


To double check the length of valid and train data used in the trainer, add in class DomainDataModule:

def get_train_length(self):
        return len(self.train_dataset)
def get_val_length(self):
        return len(self.val_dataset)


Then add before the unimodal module the following class:

from lightning.pytorch.callbacks import Callback

class DatasetLengthLogger(Callback):
    def on_train_end(self, trainer, pl_module):
        train_loader = trainer.datamodule.train_dataloader()
        val_loader = trainer.datamodule.val_dataloader()
        
        train_length = len(train_loader.dataset)
        val_length = len(val_loader.dataset)
        
        print(f"Training Data Length: {train_length}")
        print(f"Validation Data Length: {val_length}")


Finally, in def train module, before calling the trainer, add
dataset_length_logger = DatasetLengthLogger()

And in trainer, add dataset_length_logger to the callbacks:

callbacks=[
            ModelCheckpoint(
                dirpath="checkpoints",
                filename=module_name,
                monitor="val_loss",
                mode="min",
                save_top_k=1,
            ),
            dataset_length_logger
        ],


Result:
Training Data Length 128
Validation Data Length 256
@NicolasKuske NicolasKuske changed the title corrected order of valid and train data in DomainDataModule corrected order of valid and train data arguments in DomainDataModule May 19, 2024
Copy link
Collaborator

@bdvllrs bdvllrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you update the code in examples as well?

Also GWDataModule uses the same order (val, then train) can you also change it to be consistent (always train first)?
(You also have to change the code for loading the model)

@bdvllrs bdvllrs merged commit 48dc6ca into main Sep 19, 2024
1 check passed
@bdvllrs bdvllrs deleted the NicolasKuske-patch-6 branch September 19, 2024 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants