Existing Thanos deployments breaking when being updated to any of the latest helm charts #31237

bzlom · 2025-01-07T15:58:53Z

Name and Version

thanos/15.9.2

What architecture are you using?

arm64

What steps will reproduce the bug?

Deploy a previous version of the chart (the version I am running is 15.7.27 for example)
try to upgrade to version 15.9.x

Are you using any custom parameters or values?

global:
  imagePullSecrets:
    - some_secret
  security:
    allowInsecureImages: true

image:
  registry: some_registry
  repository: thanos/thanos
  tag: v0.37.2
  pullSecrets:
    - some_secret

objstoreConfig: |-
  type: s3
  config:
    bucket: some_bucket
    endpoint: s3.eu-west-1.amazonaws.com

indexCacheConfig: |-
  type: memcached
  config:
    addresses:
    - dns+mem-thanos-test.infra.internal:11211
    dns_provider_update_interval: 10s
    max_async_buffer_size: 10000
    max_async_concurrency: 20
    max_get_multi_batch_size: 0
    max_get_multi_concurrency: 100
    max_idle_connections: 100
    max_item_size: 128MiB
    timeout: 500ms

bucketCacheConfig: |-
  type: memcached
  config:
    addresses:
    - dns+mem-thanos-test.infra.internal:11211
    dns_provider_update_interval: 10s
    max_async_buffer_size: 10000
    max_async_concurrency: 20
    max_get_multi_batch_size: 0
    max_get_multi_concurrency: 100
    max_idle_connections: 100
    max_item_size: 128MiB
    timeout: 500ms

query:
  dnsDiscovery:
    sidecarsService: prometheus-operator-kube-p-thanos-discovery
    sidecarsNamespace: monitoring
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  replicaCount: 2
  resources:
    limits:
      memory: 4Gi
    requests:
      cpu: 500m
      memory: 512Mi
  logLevel: debug
  networkPolicy:
    enabled: false

queryFrontend:
  extraFlags:
    - |-
      --query-range.response-cache-config="config":
        "addresses":
        - "dns+mem-thanos-test.infra.internal:11211"
        "dns_provider_update_interval": "10s"
        "max_async_buffer_size": 10000
        "max_async_concurrency": 20
        "max_get_multi_batch_size": 0
        "max_get_multi_concurrency": 100
        "max_idle_connections": 100
        "max_item_size": "128MiB"
        "timeout": "500ms"
      "type": "memcached"
    - |-
      --labels.response-cache-config="config":
        "addresses":
        - "dns+mem-thanos-test.infra.internal:11211"
        "dns_provider_update_interval": "10s"
        "max_async_buffer_size": 10000
        "max_async_concurrency": 20
        "max_get_multi_batch_size": 0
        "max_get_multi_concurrency": 100
        "max_idle_connections": 100
        "max_item_size": "128MiB"
        "timeout": "500ms"
      "type": "memcached"
  replicaCount: 2
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 300m
      memory: 128Mi
  logLevel: debug
  networkPolicy:
    enabled: false

bucketweb:
  enabled: false
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  resources:
    limits:
      memory: 512Mi
    requests:
      cpu: 100m
      memory: 64Mi
  networkPolicy:
    enabled: false

compactor:
  enabled: true
  persistence:
    size: 100Gi
  retentionResolutionRaw: 90d
  retentionResolution5m: 90d
  retentionResolution1h: 90d
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  resources:
    limits:
      memory: 512Mi
    requests:
      cpu: 300m
      memory: 256Mi
  logLevel: debug
  networkPolicy:
    enabled: false

storegateway:
  enabled: true
  persistence:
    size: 20Gi
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  podManagementPolicy: Parallel
  replicaCount: 2
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 300m
      memory: 512Mi
  logLevel: debug
  networkPolicy:
    enabled: false

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    labels:
      release: prometheus-operator

receive:
  enabled: false
  podManagementPolicy: Parallel
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  replicaCount: 1
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 300m
      memory: 512Mi
  tolerations:
    - key: "arm64-prom"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

What is the expected behavior?

The upgrade should succeed

What do you see instead?

StatefulSet.apps "thanos-storegateway" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

Additional information

From what it looks like it's very similar to this issue: #29749
I'm guessing the stateful set template in the Helm chart was modified to allow changes to some forbidden values

dgomezleon · 2025-01-09T11:57:49Z

Hi @bzlom

You are right. The issue is related to PR #29763 (which reverts #29673). Only versions 15.7.26 and 15.7.27 are affected. I was able to upgrade using previous and higher versions. So in this case you should treat this as a major upgrade.

I hope it helps

bzlom added the tech-issues The user has a technical issue about an application label Jan 7, 2025

github-actions bot added the triage Triage is needed label Jan 7, 2025

github-actions bot assigned javsalgar Jan 7, 2025

carrodher added thanos in-progress labels Jan 8, 2025

github-actions bot removed the triage Triage is needed label Jan 8, 2025

github-actions bot assigned dgomezleon and unassigned javsalgar Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Existing Thanos deployments breaking when being updated to any of the latest helm charts #31237

Existing Thanos deployments breaking when being updated to any of the latest helm charts #31237

bzlom commented Jan 7, 2025 •

edited by carrodher

Loading

dgomezleon commented Jan 9, 2025

Existing Thanos deployments breaking when being updated to any of the latest helm charts #31237

Existing Thanos deployments breaking when being updated to any of the latest helm charts #31237

Comments

bzlom commented Jan 7, 2025 • edited by carrodher Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

dgomezleon commented Jan 9, 2025

bzlom commented Jan 7, 2025 •

edited by carrodher

Loading