Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does NeMo-Megatron-Launcher support training from bare metal environment #132

Open
zigzagcai opened this issue Sep 19, 2023 · 1 comment

Comments

@zigzagcai
Copy link

zigzagcai commented Sep 19, 2023

Hello, I want to run NeMo-Megatron-Launcher in a non-root slurm cluster (where docker engine cannot be installed), and I can't find reference guide for training in bare metal environment.
I tried to install the packages according to the provided Dockerfile, but failed with some package installation or code got crashed.
Could you please provide some hints to run NeMo-Megatron-Launcher in bare metal environment? Thanks!

@zigzagcai
Copy link
Author

zigzagcai commented Sep 27, 2023

Update:
I have made some efforts and now the main branch code runnable in bare metal environment.
For those who also want to run NeMo in bare metal environment, FYI:
https://github.com/zigzagcai/NeMo-Megatron-Launcher/tree/baremetal_run
https://github.com/zigzagcai/NeMo/tree/baremetal_run
https://github.com/zigzagcai/Megatron-LM/tree/baremetal_run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant