Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imagepullbackoff on the nvidia-operator w/ nvcr.io/nvidia/cuda #560

Open
jayunit100 opened this issue Jul 28, 2023 · 3 comments
Open

imagepullbackoff on the nvidia-operator w/ nvcr.io/nvidia/cuda #560

jayunit100 opened this issue Jul 28, 2023 · 3 comments

Comments

@jayunit100
Copy link

jayunit100 commented Jul 28, 2023

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Quick Debug Checklist

yes

1. Issue or feature description

were seeing image pull backups on the operator:

 kubectl get pods -A | grep nvidia
gpu-operator-resources   nvidia-container-toolkit-daemonset-xwp6p                      0/1     Init:ImagePullBackOff   0              73m
gpu-operator-resources   nvidia-driver-daemonset-jk2t8                                 1/1     Running                 12 (12m ago)   73m

like so:

  Back-off pulling image "[nvcr.io/nvidia/cuda@sha256:ed723a1339cddd75eb9f2be2f3476edf497a1b189c10c9bf9eb8da4a16a51a59](http://nvcr.io/nvidia/cuda@sha256:ed723a1339cddd75eb9f2be2f3476edf497a1b189c10c9bf9eb8da4a16a51a59)"

The workaround we found is, removing the tag from nvcr.io/nvidia/cuda so it uses latest.

1. Reproducing

We'll add details in a few, just wanted to make sure we filed this in case others are hitting it as wel..

@jayunit100 jayunit100 changed the title imagepullbackoff imagepullbackoff on the nvidia-operator w/ nvcr.io/nvidia/cuda Jul 28, 2023
@shivamerla
Copy link
Contributor

@jayunit100 which version of GPU Operator is this? Please note that CUDA base images are used as initContainers in some pods deployed by the operator and the version can be controlled in the ClusterPolicy here.

@jayunit100
Copy link
Author

jayunit100 commented Aug 7, 2023

hi @shivamerla --> dont recall the operator version
can you suggest a few image sha's that i can try, though ?

or a way to query for the nvidia repos image tags? i tried using

-> % imgpkg tag list -i  nvcr.io/nvidia/cuda    
imgpkg: Error: unrecognized challenge: 

but it looks like theyre not accessible as a standard docker repo that way ... and nvcr.io sais authorization required...

@shivamerla
Copy link
Contributor

you can try using this image.

nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04
 sudo docker regctl manifest get nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04
Name:        nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04
MediaType:   application/vnd.docker.distribution.manifest.list.v2+json
Digest:      sha256:f8870283bea6a85ba4b4a5e1b65158dd15e8009e433539e7c83c94707e703a1b
             
Manifests:   
             
  Name:      nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04@sha256:51c9b445ee2a1eb94631ed5dabc755e915db7485fee3cc5c754df9298b16e81e
  Digest:    sha256:51c9b445ee2a1eb94631ed5dabc755e915db7485fee3cc5c754df9298b16e81e
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/arm64
             
  Name:      nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04@sha256:1069ccd2910506f68e1d7c0907a32aaa877b8038d1aa24cb7ffb2d2a85d725c7
  Digest:    sha256:1069ccd2910506f68e1d7c0907a32aaa877b8038d1aa24cb7ffb2d2a85d725c7
  MediaType: application/vnd.docker.distribution.manifest.v2+json
  Platform:  linux/amd64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants