Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting History of a Cancelled Experiment #65

Open
Leon-Leyang opened this issue Jul 16, 2024 · 1 comment
Open

Deleting History of a Cancelled Experiment #65

Leon-Leyang opened this issue Jul 16, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Leon-Leyang
Copy link

❓ Questions

Hi @adefossez,

Thank you for developing such a useful package. I have encountered an issue with rerunning a cancelled experiment (XP). Specifically, I'm using dora grid baseline --clear with the expectation that the experiment would start from scratch. Initially, it appears to work as the history.json file is deleted. However, as the process continues, the previous history reappears before the current training starts, causing the training to resume from the last cancellation point.

This issue does not occur if the previous experiment completes successfully; in those cases, the --clear option works as expected. Could you advise on how to ensure that a cancelled experiment restarts completely from scratch when rerun?

Thank you for your assistance.

@Leon-Leyang Leon-Leyang added the question Further information is requested label Jul 16, 2024
@adefossez
Copy link
Contributor

This might happen if the previous experiment is still running and is not properly cancelled. Could you double check that the previous XP is not still running on the cluster ? It is normally cancelled but there based on the configuration of slurm, it is possible the experiment will continue running for 1 or 2 min.
One solution would be to first run dora grid baseline -C to cancel all the experiments, wait to see them indeed gone from the cluster, then running the --clear command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants