Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing Thanos deployments breaking when being updated to any of the latest helm charts #31237

Open
bzlom opened this issue Jan 7, 2025 · 1 comment
Assignees
Labels
in-progress tech-issues The user has a technical issue about an application thanos

Comments

@bzlom
Copy link

bzlom commented Jan 7, 2025

Name and Version

thanos/15.9.2

What architecture are you using?

arm64

What steps will reproduce the bug?

  1. Deploy a previous version of the chart (the version I am running is 15.7.27 for example)
  2. try to upgrade to version 15.9.x

Are you using any custom parameters or values?

global:
  imagePullSecrets:
    - some_secret
  security:
    allowInsecureImages: true

image:
  registry: some_registry
  repository: thanos/thanos
  tag: v0.37.2
  pullSecrets:
    - some_secret

objstoreConfig: |-
  type: s3
  config:
    bucket: some_bucket
    endpoint: s3.eu-west-1.amazonaws.com

indexCacheConfig: |-
  type: memcached
  config:
    addresses:
    - dns+mem-thanos-test.infra.internal:11211
    dns_provider_update_interval: 10s
    max_async_buffer_size: 10000
    max_async_concurrency: 20
    max_get_multi_batch_size: 0
    max_get_multi_concurrency: 100
    max_idle_connections: 100
    max_item_size: 128MiB
    timeout: 500ms

bucketCacheConfig: |-
  type: memcached
  config:
    addresses:
    - dns+mem-thanos-test.infra.internal:11211
    dns_provider_update_interval: 10s
    max_async_buffer_size: 10000
    max_async_concurrency: 20
    max_get_multi_batch_size: 0
    max_get_multi_concurrency: 100
    max_idle_connections: 100
    max_item_size: 128MiB
    timeout: 500ms

query:
  dnsDiscovery:
    sidecarsService: prometheus-operator-kube-p-thanos-discovery
    sidecarsNamespace: monitoring
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  replicaCount: 2
  resources:
    limits:
      memory: 4Gi
    requests:
      cpu: 500m
      memory: 512Mi
  logLevel: debug
  networkPolicy:
    enabled: false

queryFrontend:
  extraFlags:
    - |-
      --query-range.response-cache-config="config":
        "addresses":
        - "dns+mem-thanos-test.infra.internal:11211"
        "dns_provider_update_interval": "10s"
        "max_async_buffer_size": 10000
        "max_async_concurrency": 20
        "max_get_multi_batch_size": 0
        "max_get_multi_concurrency": 100
        "max_idle_connections": 100
        "max_item_size": "128MiB"
        "timeout": "500ms"
      "type": "memcached"
    - |-
      --labels.response-cache-config="config":
        "addresses":
        - "dns+mem-thanos-test.infra.internal:11211"
        "dns_provider_update_interval": "10s"
        "max_async_buffer_size": 10000
        "max_async_concurrency": 20
        "max_get_multi_batch_size": 0
        "max_get_multi_concurrency": 100
        "max_idle_connections": 100
        "max_item_size": "128MiB"
        "timeout": "500ms"
      "type": "memcached"
  replicaCount: 2
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 300m
      memory: 128Mi
  logLevel: debug
  networkPolicy:
    enabled: false

bucketweb:
  enabled: false
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  resources:
    limits:
      memory: 512Mi
    requests:
      cpu: 100m
      memory: 64Mi
  networkPolicy:
    enabled: false

compactor:
  enabled: true
  persistence:
    size: 100Gi
  retentionResolutionRaw: 90d
  retentionResolution5m: 90d
  retentionResolution1h: 90d
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  resources:
    limits:
      memory: 512Mi
    requests:
      cpu: 300m
      memory: 256Mi
  logLevel: debug
  networkPolicy:
    enabled: false

storegateway:
  enabled: true
  persistence:
    size: 20Gi
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - arm64
  tolerations:
    - key: "arm64"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  podManagementPolicy: Parallel
  replicaCount: 2
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 300m
      memory: 512Mi
  logLevel: debug
  networkPolicy:
    enabled: false

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    labels:
      release: prometheus-operator

receive:
  enabled: false
  podManagementPolicy: Parallel
  podAnnotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxxxxxxx:role/some_role
  replicaCount: 1
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 300m
      memory: 512Mi
  tolerations:
    - key: "arm64-prom"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

What is the expected behavior?

The upgrade should succeed

What do you see instead?

StatefulSet.apps "thanos-storegateway" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

Additional information

From what it looks like it's very similar to this issue: #29749
I'm guessing the stateful set template in the Helm chart was modified to allow changes to some forbidden values

@bzlom bzlom added the tech-issues The user has a technical issue about an application label Jan 7, 2025
@github-actions github-actions bot added the triage Triage is needed label Jan 7, 2025
@github-actions github-actions bot removed the triage Triage is needed label Jan 8, 2025
@github-actions github-actions bot assigned dgomezleon and unassigned javsalgar Jan 8, 2025
@dgomezleon
Copy link
Member

Hi @bzlom

You are right. The issue is related to PR #29763 (which reverts #29673). Only versions 15.7.26 and 15.7.27 are affected. I was able to upgrade using previous and higher versions. So in this case you should treat this as a major upgrade.

I hope it helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-progress tech-issues The user has a technical issue about an application thanos
Projects
None yet
Development

No branches or pull requests

4 participants