Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracker.finish to deal with subprocess stuff #801

Merged
merged 1 commit into from
Nov 12, 2024
Merged

tracker.finish to deal with subprocess stuff #801

merged 1 commit into from
Nov 12, 2024

Conversation

dlwh
Copy link
Member

@dlwh dlwh commented Nov 12, 2024

we use subprocess in marin to invoke levanter, but subprocesses don't wait on other subprocesses somehow, and so wandb doesn't get a chance to finish. This solves this

@@ -268,6 +268,9 @@ def compute_log_probs(model, example):
checkpointer = trainer.config.checkpointer.create(trainer.run_id)
checkpointer.wait_until_finished()

# This isn't necessary except when Levanter is run in a subprocess (as happens w/ ray)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sense that this gets called automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@dlwh dlwh merged commit b0d53a0 into main Nov 12, 2024
8 checks passed
@dlwh dlwh deleted the wandb_finish branch November 12, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants