-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DDP unused param error when TE is enabled in NeMo Lite #11364
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Onur Yilmaz <[email protected]>
if self.model_accelerator == "te": | ||
from nemo.lightning.pytorch.accelerate.transformer_engine import te_accelerate | ||
|
||
te_accelerate(self.model, fp8_autocast=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you intend other settings like fp8 autocast to be specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I guess I should change that to a parameter class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, change it to partial function as @akoumpa suggested.
examples/llm/sft/hf.py
Outdated
@@ -75,17 +75,9 @@ def squad(tokenizer) -> pl.LightningDataModule: | |||
grad_clip = None | |||
use_dist_samp = False | |||
|
|||
model = llm.HfAutoModelForCausalLM(args.model) | |||
model = llm.HfAutoModelForCausalLM(model_name=args.model, model_accelerator=args.model_accelerator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass a partial function instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about it but I am just not sure how it'll look from the user perspective.
Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base. Your code was analyzed with PyLint. The following annotations have been identified:
Thank you for improving NeMo's documentation! |
[🤖]: Hi @oyilmaz-nvidia 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
What does this PR do ?
Fixed DDP issue with TE