-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is AMI for OS: AL2023, GPU support: true, architecture: x86_64 missing? #1526
Comments
Hey @shashiranjan84, sorry you're running into this! AL2023 with GPU support was added to EKS after we've added AL2023 support to the provider. |
Thanks @flostadler . we deploy in multiple regions so was trying to avoid any hardcoding of AMI id. |
You do not have to hardcode the AMI id. You can retrieve the region specific AMI from SSM Parameter Store like this:
|
AWS added two new gpu capable optimized AMIs for AL2023. One is for Nvidia based instances, the other is for Neuron based instances ( Adding support for the Nvidia based one is rather easy, but before adding Neuron support we'll need to extend the AMI selection to be instance type aware. So far it's only architecture aware. |
Makes sense |
I was trying to switch to Al2300 GPU AMI but after updating I am not seeing any nodes. I was expecting it to be rolling update of the nodes but now I am seeing no nodes const EKS_VERSION = '1.29';
const ami = pulumi.interpolate`/aws/service/eks/optimized-ami/${EKS_VERSION}/amazon-linux-2023/x86_64/nvidia/recommended/image_id`.apply((name) =>
aws.ssm.getParameter({ name, }, { async: true, provider: ... }),
).apply((result) => result.value);
const cluster = new eks.Cluster(
`${regionalNamespace}-cluster`,
{
name: `${regionalNamespace}`,
version: EKS_VERSION,
vpcId: ...,
privateSubnetIds: ...
publicSubnetIds: ...,
enabledClusterLogTypes: ['api', 'audit', 'authenticator'],
tags: projectTags,
endpointPrivateAccess: true,
endpointPublicAccess: true,
nodeAssociatePublicIpAddress: false,
providerCredentialOpts: {
profileName: aws.config.profile,
},
roleMappings: [
...
],
instanceType: 'g5.2xlarge',
// gpu: true,
nodeAmiId: ami,
nodeRootVolumeSize: 200,
...,
},
{ provider: ... },
); |
To give a context, we currently on Kubernetes version 1.29 and trying to upgrade to 1.31. EKS and Kubernetes plugin version are respectively 2.2.1 and 4.8.1, which we also planning to upgrade to latest. What would be best migration approach to avoid downtime? |
@shashiranjan84 the EKS provider 2.x.x does not support AL2023 and Bottlerocket. You'll need to upgrade to version 3 of the provider. Self managed node groups (like the cluster default node group) require more careful handling to guarantee downtime-less updates generally. If possible, I'd recommend you to upgrade to either using managed node groups or EKS Auto Mode instead. I'd recommend you to first upgrade to EKS provider version 3 following this guide: https://www.pulumi.com/registry/packages/eks/how-to-guides/v3-migration. It shouldn't replace your existing node groups if you set the |
After updating to EKS 3.5 and updating default node group(after setting AMI id), we seeing this error at end of deployment
|
@shashiranjan84 this sounds like a separate problem. Can you please open another issue for this and include code and steps to reproduce this. Thanks a lot! Anyways, let's take this to a new issue and we can dig into it! Feel free to tag me there |
@flostadler I also notice when I create a managed node group with a amiID, worker node instance do not have same amiID.
Is that expected? |
@shashiranjan84 Only the nodes part of the managed node group will have that AMI ID. The nodes part of the default self-managed node group will have a different AMI ID. |
This change adds support for the AL2023 x86_64 GPU optimized AMI. See [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html) for a list of supported AMIs. The AMI type (`AL2023_x86_64_NVIDIA`) is taken from the [AWS API schema](https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html#AmazonEKS-CreateNodegroup-request-amiType). Note: adding support for the Neuron based AMI type is tracked in #1526. This will require making the AMI selection instance type aware. Relates to #1526
Here I am explicitly providing same node AMI ID for both self managed and managed node group assuming they will stack up with same GPU optimized AMI ID. Even when I hardcoded AMI id in managed node group, worked node in managed group was showing different AMI id, as if it completely ignoring AMI ID property |
FYI version v3.6.0 was released with support for nvidia based GPUs for AL2023. I'm closing this issue for now and opened this one (#1561) for adding Neuron support. @shashiranjan84 I'll continue looking into the other issues you've opened, but you'll not have to need to use the AMI override anymore. The provider should now select the appropriate AMI for all instances with NVIDIA GPUs |
Thanks a lot! |
What happened?
I do not see any entry for AL2023 GPU optimized AMI here. But I do see AWS have optimized AMI for Nvidia
I am trying to update K8s version from 1.29 to 1.31 and also updated the Pulumi EKS from 2.2.1 to 3.4.0
Example
Output of
pulumi about
CLI Version 3.142.0 Go Version go1.23.3 Go Compiler gc Host OS debian Version 11.7 Arch x86_64 Backend Name fv-az1490-728 URL s3://staging-pulumi-state-io User root Organizations Token type personal
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: