Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eks): update nodegroup gpu check #31445

Closed

Conversation

AlexKaracaoglu
Copy link

@AlexKaracaoglu AlexKaracaoglu commented Sep 13, 2024

Issue # (if applicable)

Closes #31347.

Reason for this change

The motivating bug is that you cannot add a combo of g5 and a g6 as instance classes onto the same node group or else the following error will be thrown: instanceTypes of different architectures is not allowed.

Description of changes

  • (eks) Fixes the isGpuInstanceType check
    • G6/G6E instance classes will now be recognized as GPU instance types so those different types can be used together for a multi instance type managed node group

Description of how you validated changes

Wrote unit tests for the isGpuInstanceType function

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team September 13, 2024 19:43
@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/small Small work item – less than a day of effort p2 labels Sep 13, 2024
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Sep 13, 2024
Copy link
Contributor

@lpizzinidev lpizzinidev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

tools/@aws-cdk/cdk-build-tools/package.json Outdated Show resolved Hide resolved
@aws-cdk-automation aws-cdk-automation added pr/needs-maintainer-review This PR needs a review from a Core Team Member and removed pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. labels Sep 14, 2024
Copy link
Contributor

@sumupitchayan sumupitchayan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR @AlexKaracaoglu - I think adding an Integ test or modifying and existing one to use this new Instance Type would be nice to confirm that it successfully deploys.

Also, can you add a unit test for the isGpuInstanceType function?

@pahud
Copy link
Contributor

pahud commented Oct 6, 2024

Hi @AlexKaracaoglu

Not sure if you are still on it but you can refer to this commit I made
pahud@8fd3db8

@AlexKaracaoglu
Copy link
Author

@pahud @sumupitchayan - Appreciate the follow ups, I'll get some changes in later today

@AlexKaracaoglu AlexKaracaoglu changed the title chore(eks): update nodegroup gpu check and add g6e instance class chore(eks): update nodegroup gpu check Oct 11, 2024
@paulhcsun paulhcsun changed the title chore(eks): update nodegroup gpu check feat(eks): update nodegroup gpu check Nov 5, 2024
Copy link
Contributor

@paulhcsun paulhcsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AlexKaracaoglu, as mentioned by Sumu, could you use the new instance types in an existing integ test? You can refer to this PR for reference: #27373. I can also help to run the integ test if needed once you add the changes.

@paulhcsun paulhcsun self-assigned this Nov 5, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Nov 5, 2024
@AlexKaracaoglu
Copy link
Author

AlexKaracaoglu commented Nov 22, 2024

Hey @paulhcsun - I appreciate the follow up. I added some changes to the integration tests to support the gpu nodegroup case. I am able to partly run the integ tests but cannot perform a real deployment - would you be able to assist me with running/getting the snapshots updated now that the changes are in?

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week.

@aws-cdk-automation
Copy link
Collaborator

This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error.

@aws-cdk-automation aws-cdk-automation added the closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. label Dec 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2024
@paulhcsun paulhcsun reopened this Dec 12, 2024
@paulhcsun paulhcsun added the pr-linter/do-not-close The PR linter will not close this PR while this label is present label Dec 12, 2024
@paulhcsun
Copy link
Contributor

Hi @AlexKaracaoglu, apologies for letting this PR drop, I totally missed your comment. I've re-opened the PR and added the pr-linter/do-not-close label.

I was able to run the integ tests successfully but I'm not able to push the changes through your fork because I do not have access rights to the libertymutual fork. Is it possible to temporarily grant me access so that I can push the updated snapshots?

@aws-cdk-automation
Copy link
Collaborator

The pull request linter fails with the following errors:

❌ Features must contain a change to a README file.
❌ Features must contain a change to an integration test file and the resulting snapshot.

PRs must pass status checks before we can provide a meaningful review.

If you would like to request an exemption from the status checks or clarification on feedback, please leave a comment on this PR containing Exemption Request and/or Clarification Request.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: ad301b7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@paulhcsun
Copy link
Contributor

Hi @AlexKaracaoglu, I'm still not able to push to the libertymutual fork as mentioned in my previous comment and I haven't received a response. Given the importance of your contribution and the need to move forward with the project, we've decided to take the following approach:

  1. We've created a new PR, feat(eks): update nodegroup gpu check #32715, that incorporates your changes along with the necessary integration test snapshots.
  2. In this new PR, we've credited you as the original author and linked to your original PR to ensure your work is properly acknowledged.
  3. You've been tagged as a co-author to maintain clear attribution of your contribution.

We believe this approach allows us to integrate your work while maintaining project momentum.

I will be closing this PR in favor of the other. If you have any questions, concerns, or would like to discuss this further, please don't hesitate to reach out. We truly appreciate your work and hope you'll continue to contribute to our project in the future.

@paulhcsun paulhcsun closed this Jan 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/small Small work item – less than a day of effort p2 pr-linter/do-not-close The PR linter will not close this PR while this label is present
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws-eks: manage nodegroups GPU instance types not up to date
6 participants