You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When deployment a new SeldonDeployment and expecting pods to be restarted with the new version, not all pods are created with the new version and the Kube-controller-manager complains about
"Error syncing deployment" deployment="analytics/name_of_the_sdep" err="Operation cannot be fulfilled on replicasets.apps \"pod-name-7894b6f9dd\": the object has been modified; please apply your changes to the latest version and try again"
I haven't checked with Seldon-Core V2, those are on Seldon-core V1 and it has been happening since Seldon Core 1.16.0
To reproduce
Deploy a model in K8s with multiple pods running
Deploy a model with another Docker tag for example
Observe that not all pods will be updated
kube-controller-manager complains with the error mentioned above
Expected behaviour
All pods should be updated with the latest version of the Docker image and it's not the case
They should all be 21m old but for some of them, the deployment sync has errored.
Environment
Cloud Provider: EKS
Kubernetes Cluster Version: 1.24/ 1.25
Deployed Seldon System: 1.16.0
Model Details
Images of your model: [Output of: kubectl get seldondeployment -n <yourmodelnamespace> <seldondepname> -o yaml | grep image: where <yourmodelnamespace>]
Logs of your model: [You can get the logs of your model by running kubectl logs -n <yourmodelnamespace> <seldonpodname> <container>]
Logs of seldon-controller-manager:
{"volumes":[{"name":"regressor-provision-location","emptyDir":{}},{"name":"seldon-podinfo","downwardAPI":{"items":[{"path":"annotations","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.annotations"}}],"defaultMode":420}}],"initContainers":[{"name":"regressor-model-initializer","image":"kserve/storage-initializer:v0.9.0","args":["s3://amazing-bucket-name/amazing-model-name/amazing-model-name/2023-08-07T01-30-00/","/mnt/models"],"resources":{},"volu
{"level":"info","ts":1691681150.4964962,"logger":"controllers.SeldonDeployment","msg":"The deployments are the same - api server defaults ignored","SeldonDeployment":"analytics/amazing-model-name"}
{"level":"info","ts":1691681150.4965105,"logger":"controllers.SeldonDeployment","msg":"Found identical deployment","SeldonDeployment":"analytics/amazing-model-name","namespace":"analytics","name":"amazing-model-name-prediction-pipeline-0-regressor","status":{"observedGeneration":93042,"replicas":3,"updatedReplicas":3,"readyReplicas":3,"availableReplicas":3,"conditions":[{"type":"Available","status":"True","lastUpdateTime":"2023-08-09T07:21:30Z","lastTransitionTime":"2023-08-09T07
{"level":"info","ts":1691681150.496532,"logger":"controllers.SeldonDeployment","msg":"Deployment status","SeldonDeployment":"analytics/amazing-model-name","name":"amazing-model-name-prediction-pipeline-0-regressor","status":{"observedGeneration":93042,"replicas":3,"updatedReplicas":3,"readyReplicas":3,"availableReplicas":3,"conditions":[{"type":"Available","status":"True","lastUpdateTime":"2023-08-09T07:21:30Z","lastTransitionTime":"2023-08-09T07:21:30Z","reason":"MinimumReplicas
{"level":"info","ts":1691681150.4965816,"logger":"controllers.SeldonDeployment","msg":"Found identical Service","SeldonDeployment":"analytics/amazing-model-name","all":true,"namespace":"analytics","name":"amazing-model-name-prediction-pipeline-regressor","status":{"loadBalancer":{}}}
{"level":"info","ts":1691681150.496616,"logger":"controllers.SeldonDeployment","msg":"Found identical Service","SeldonDeployment":"analytics/amazing-model-name","all":true,"namespace":"analytics","name":"amazing-model-name-prediction-pipeline","status":{"loadBalancer":{}}}
{"level":"info","ts":1691681150.496658,"logger":"controllers.SeldonDeployment","msg":"Removing unused services","SeldonDeployment":"analytics/amazing-model-name"}
{"level":"info","ts":1691681150.4968553,"logger":"controllers.SeldonDeployment","msg":"Reconcile called","SeldonDeployment":"analytics/ml-prep-times"}
{"level":"info","ts":1691681150.49691,"logger":"seldondeployment","msg":"Defaulting Seldon Deployment called","name":"ml-prep-times"}
{"level":"info","ts":1691681150.4969313,"logger":"controllers.SeldonDeployment","msg":"pSvcName","SeldonDeployment":"analytics/ml-prep-times","val":"ml-prep-times-graph-main"}
{"level":"info","ts":1691681150.4972584,"logger":"controllers.SeldonDeployment","msg":"Found identical Service","SeldonDeployment":"analytics/ml-prep-times","all":false,"namespace":"analytics","name":"ml-prep-times-graph-main-main","status":{"loadBalancer":{}}}
{"level":"info","ts":1691681150.4972935,"logger":"controllers.SeldonDeployment","msg":"Found identical Service","SeldonDeployment":"analytics/ml-prep-times","all":false,"namespace":"analytics","name":"ml-prep-times-graph-main","status":{"loadBalancer":{}}}
{"level":"info","ts":1691681150.4973407,"logger":"controllers.SeldonDeployment","msg":"Found identical HPA","SeldonDeployment":"analytics/ml-prep-times","namespace":"analytics","name":"ml-prep-times-graph-main-0-main","status":{"lastScaleTime":"2023-08-09T07:11:01Z","currentReplicas":2,"desiredReplicas":2,"currentMetrics":[{"type":"Resource","resource":{"name":"memory","current":{"averageValue":"429568k","averageUtilization":40}}},{"type":"Resource","resource":{"name":"cpu","current":{
{"level":"info","ts":1691681150.49742,"logger":"controllers.SeldonDeployment","msg":"Scheme","SeldonDeployment":"analytics/ml-prep-times","r.scheme":{}}
{"level":"info","ts":1691681150.4974318,"logger":"controllers.SeldonDeployment","msg":"createDeployments","SeldonDeployment":"analytics/ml-prep-times","deploy":{"namespace":"analytics","name":"ml-prep-times-graph-main-0-main"}}
{"level":"info","ts":1691681150.5027092,"logger":"controllers.SeldonDeployment","msg":"Updating Deployment","SeldonDeployment":"analytics/ml-prep-times","namespace":"analytics","name":"ml-prep-times-graph-main-0-main"}
The text was updated successfully, but these errors were encountered:
stephen37
changed the title
Error syncing deployment - Operation cannot be fulfilled on replicasets.apps
V1: Error syncing deployment - Operation cannot be fulfilled on replicasets.apps
Aug 10, 2023
Hi @stephen37 @cliveseldon Facing a similar issue Old pods are not getting deleted when new pods are up.. Is this issue resolved in later versions of seldon??
We have a very similar issue with the same logs, where somehow the controller keeps saying that the deployments are the same and then tries to reconcile. But actually nothing seems to happen and the model seems fine. Though argocd says the deployment is stuck. I don't fully understand what is going on, the deployment is similar to all our other deployments. We have set the number of replicas to a fixed number but it still happens. Any thoughts on why the operator might think there are duplicate deployments and services?
When I describe the SeldonDeployment I see this:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Updated 3m47s (x1451532 over 20h) seldon-controller-manager Updated SeldonDeployment "xxx"
edit: I did find the issue and reported it here: #5435
Describe the bug
When deployment a new SeldonDeployment and expecting pods to be restarted with the new version, not all pods are created with the new version and the Kube-controller-manager complains about
I haven't checked with Seldon-Core V2, those are on Seldon-core V1 and it has been happening since Seldon Core 1.16.0
To reproduce
kube-controller-manager
complains with the error mentioned aboveExpected behaviour
All pods should be updated with the latest version of the Docker image and it's not the case
They should all be
21m
old but for some of them, the deployment sync has errored.Environment
Model Details
Images of your model: [Output of:
kubectl get seldondeployment -n <yourmodelnamespace> <seldondepname> -o yaml | grep image:
where<yourmodelnamespace>
]Logs of your model: [You can get the logs of your model by running
kubectl logs -n <yourmodelnamespace> <seldonpodname> <container>
]Logs of seldon-controller-manager:
The text was updated successfully, but these errors were encountered: