Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distributed with p5 instances #4988

Open
abdalgader-a opened this issue Jan 7, 2025 · 3 comments
Open

distributed with p5 instances #4988

abdalgader-a opened this issue Jan 7, 2025 · 3 comments
Assignees
Labels
component: training Relates to the SageMaker Training Platform

Comments

@abdalgader-a
Copy link

abdalgader-a commented Jan 7, 2025

Hi, referring to above line - does this still not support SMModelParallel and SMDataParallel when run on p5 instances?

Thanks!

@abdalgader-a
Copy link
Author

@ahsan-z-khan any thoughts?

@benieric
Copy link
Contributor

benieric commented Feb 1, 2025

Hi @abdalgader-a, from these docs - https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-data-parallel-support.html#distributed-data-parallel-supported-instance-types

And release notes - https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-release-notes.html

It does not seem like it. However, we have recently released a new Training interface called the ModelTrainer which does not contain such check so you should be able to at least boot up a job and verify if the behavior is as you expect - (example)

@benieric benieric added the component: training Relates to the SageMaker Training Platform label Feb 3, 2025
@nargokul nargokul assigned benieric and nargokul and unassigned benieric Feb 4, 2025
@abdalgader-a
Copy link
Author

@benieric -- thanks for getting back on this. I'll refactor base on the new training interface and test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: training Relates to the SageMaker Training Platform
Projects
None yet
Development

No branches or pull requests

3 participants