-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to run neuron device plugin on EKS with containerd only #794
Comments
I believe these:
Are related to aws-neuron/aws-neuron-driver#6 but not part of the issue described here |
@bryantbiggs thanks for your report. We will investigate and respond soon. |
@bryantbiggs when using eks it is unnecessary to install aws-neuronx-oci-hook. The only necessary package to install on worker nodes is the driver Could you please try again, only installing |
@james-aws - just to clarify, the Does that also mean that the containerd config.toml should not be updated as well or is there a different config for that since the hooks aren't used? default_runtime_name = "neuron"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.neuron]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.neuron.options]
BinaryName = "/opt/aws/neuron/bin/oci_neuron_hook_wrapper.sh" |
when using eks |
I assume you mean when using the supplied EKS AMIs they are pre-installed, since if you are supplying your own AMI running on EKS makes no difference.. |
I am trying to setup the Neuron device plugin on EKS with a custom AL2023 AMI but I am getting the following error:
In the containerd journalctl logs I am seeing these log lines, but I can't track down any info on this so far:
I am not installing any Docker components, I am only using containerd (this is the norm starting in EKS 1.24+). I have installed the following on the AMi:
The source AMI is
ami-0d4df6583e939a1c4
(us-east-1
) which is the latest Amazon Linux 2023 minimal - all of the EKS components have been installed and validated (kubelet, containerd, etc.)The containerd config in use:
The Neuron device plugin daemonset in use:
With clusterrole:
From dmesg:
The text was updated successfully, but these errors were encountered: