Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error writing object to s3 backend #4361

Open
benmathews opened this issue Nov 20, 2024 · 1 comment
Open

error writing object to s3 backend #4361

benmathews opened this issue Nov 20, 2024 · 1 comment

Comments

@benmathews
Copy link

Describe the bug
On October 24, my tempo-ingester pods started throwing the below errors and ingester and compacter latency increased quite a bit (couple hundred ms to multiple seconds).

level=error caller=flush.go:233 org_id=single-tenant msg="error performing op in flushQueue" op=1 block=77c398c8-cc47-4764-a995-fe0de5760e7d attempts=1 err="error copying block from local to remote backend: error writing object to s3 backend, object tempo/single-tenant/77c398c8-cc47-4764-a995-fe0de5760e7d/data.parquet: context deadline exceeded"

This did not align with a software, config, or network change that I can tell. We are still writing to S3, but slowly. I can't tell if the deadline exceeded blocks get retried or dropped.

To Reproduce
Steps to reproduce the behavior:
Normal operation reproduces the behavior

Environment:

  • Infrastructure: Kubernetes running on bare metal. The local storage is provided by a Pure appliance. Long term storage is in S3.
  • Deployment tool: helm
➜ helm history tempo
REVISION	UPDATED                 	STATUS    	CHART                   	APP VERSION	DESCRIPTION     
140     	Mon Oct 14 16:26:31 2024	superseded	tempo-distributed-1.18.4	2.6.0      	Upgrade complete
141     	Wed Nov 20 14:32:00 2024	superseded	tempo-distributed-1.22.1	2.6.0      	Upgrade complete
142     	Wed Nov 20 14:57:16 2024	deployed  	tempo-distributed-1.22.1	2.6.0      	Upgrade complete

Additional Context
values.yaml overrides

USER-SUPPLIED VALUES:
compactor:
  config:
    compaction:
      max_time_per_tenant: 15m
  replicas: 12
  resources:
    requests:
      cpu: 600m
      memory: 2Gi
distributor:
  replicas: 6
  resources:
    requests:
      cpu: 2
      memory: 1500Mi
ingester:
  persistence:
    enabled: true
    inMemory: false
    size: 30Gi
    storageClass: null
  replicas: 30
  resources:
    requests:
      cpu: 1
      memory: 5Gi
memcached:
  replicas: 3
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
memcachedExporter:
  enabled: true
metaMonitoring:
  serviceMonitor:
    enabled: true
metricsGenerator:
  enabled: false
prometheusRule:
  enabled: true
querier:
  config:
    max_concurrent_queries: 40
    search:
      query_timeout: 1m
    trace_by_id:
      query_timeout: 1m
  replicas: 40
  resources:
    requests:
      cpu: 50m
      memory: 2Gi
query_frontend:
  max_outstanding_per_tenant: 4000
queryFrontend:
  config:
    search:
      concurrent_jobs: 5000
  replicas: 2
  resources:
    requests:
      cpu: 10m
      memory: 150Mi
reportingEnabled: false
server:
  http_server_read_timeout: 4m
  http_server_write_timeout: 4m
storage:
  trace:
    backend: s3
    pool:
      queue_depth: 50000
    s3:
      access_key: *******************
      bucket: *****************
      endpoint: s3.us-west-2.amazonaws.com
      prefix: tempo
      secret_key: *********************
tempo:
  structuredConfig:
    overrides:
      defaults:
        ingestion:
          burst_size_bytes: 800000000
          max_traces_per_user: 3000000
          rate_limit_bytes: 600000000
traces:
  otlp:
    grpc:
      enabled: true
    http:
      enabled: true
@joe-elliott
Copy link
Member

If compactors and ingesters were simultaneously having issues speaking with object storage this suggests a networking or object storage issue.

I can't tell if the deadline exceeded blocks get retried or dropped.

They are retried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants