You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered a couple of issues while using the nim-operator, and I wanted to share them along with the solutions I found.
Initial Startup Error:
I received the following error during the startup of nim-operator:
E1107 03:29:01.158923 1 reflector.go:158] "Unhandled Error" err="sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v2.HorizontalPodAutoscaler: failed to list *v2.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:k8s-nim-operator-system:k8s-nim-operator-controller-manager" cannot list resource "horizontalpodautoscalers" in API group "autoscaling" at the cluster scope" logger="UnhandledError"
Upon investigation, I discovered that in the config/rbac/role.yaml file, the term horizontalpodautoscalars should be corrected to horizontalpodautoscalers
After correcting the spelling mistake, the nim-operator reported another error:
2024-11-07T04:24:46Z INFO Starting Controller {"controller": "nimservice" "controllerGroup" "apps.nvidia.com" "controllerKind": "NIMService"}
2024-11-07704:24:46Z ERROR controller-runtime.source.EventHandler if kind is a CRD, it should be installed before calling Start {"kind": "NemoGuardrail.apps.nvidia.com", "error": "no matches for kind "NemoGuardrail" in version "apps.nvidia.com/vlalpha1""}
sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind[...]).Start.func1.1
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:71
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/loop.go:53
k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/Loop.go:54
k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/poll.go:33
sigs.k8s.io/controller-runtime/pkg/internal/source. (*Kind[...]).Start.func1
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/source/kind.go:64
This issue was resolved by installing bases/apps.nvidia.com_nemoguardrails.yaml in the config/crd/kustomization.yaml.
With these changes, nim-operator is now running correctly:
kubectl get po -n k8s-nim-operator-system
NAME READY STATUS RESTARTS AGE
k8s-nim-operator-controller-manager-68d67d4b69-rr2rn 1/1 Running 0 177m
Additionally, I have made the relevant code changes. Would it be possible for me to submit these modifications to the community?
Thank you!
The text was updated successfully, but these errors were encountered:
@wqlparallel Thanks for point this out. Yes, please feel free to raise a PR with your suggested changes. With the Helm chart these are setup right, but directly install from the generated manifests will fail. You need to update the RBAC here and generate manifests again.
1. Quick Debug Information
2. Issue or feature description
I have encountered a couple of issues while using the nim-operator, and I wanted to share them along with the solutions I found.
I received the following error during the startup of nim-operator:
Upon investigation, I discovered that in the config/rbac/role.yaml file, the term horizontalpodautoscalars should be corrected to horizontalpodautoscalers
This issue was resolved by installing bases/apps.nvidia.com_nemoguardrails.yaml in the config/crd/kustomization.yaml.
With these changes, nim-operator is now running correctly:
Additionally, I have made the relevant code changes. Would it be possible for me to submit these modifications to the community?
Thank you!
The text was updated successfully, but these errors were encountered: