Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize event storage when loading checkpoint in lightning module #5322

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mpowelson
Copy link

If you construct the Lightning training module and then immediately load a checkpoint, it crashes because self.storage is None until the first training_step call. This creates the event storage in on_load_checkpoint too. I'm not sure if this is the best solution or if it should actually go in init. The comment in training_step indicates it should not go there.

In training_step several things are being set up if self.storage is None. I split out the self.writers setup, but possibly something should be done with iteration_timer as well. My current use case is not resuming training from a checkpoint, so I haven't tried that.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants