Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker resource detection fails after system reboot #34761

Open
litetex opened this issue Aug 20, 2024 · 5 comments
Open

Docker resource detection fails after system reboot #34761

litetex opened this issue Aug 20, 2024 · 5 comments
Labels
bug Something isn't working processor/resourcedetection Resource detection processor

Comments

@litetex
Copy link

litetex commented Aug 20, 2024

Component(s)

processor/resourcedetection, processor/resourcedetection/internal/docker

What happened?

Description

I'm running a container of this project with the resourcedetectionprocessor/docker.

The corresponding config looks like this:

processors:
  resourcedetection:
    detectors: [docker]

After a system restart the resource detector loses it's functionallity (the lables from the docker host are no longer picked up) and the following message shows up in the log:

warn    internal/resourcedetection.go:130       failed to detect resource       {"kind": "processor", "name": "resourcedetection", "pipeline": "logs", "error": "failed getting OS type: failed to fetch Docker OS type: Get \"http://%2Fvar%2Frun%2Fdocker.sock/v1.46/info\": context deadline exceeded"}

This is likely because docker is currently starting up.

Possible fixes:

  • Retry resource detection for a limited amount of time (a few minutes?)
  • Crash the container when this happens so that it can get restarted automatically due to the restart policy

It would be great to also make this configureable.

Collector version

v0.107.0

Environment information

Environment

Ubunut 24.04 LTS
Docker CE 27.1.2

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@litetex litetex added bug Something isn't working needs triage New item requiring triage labels Aug 20, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the processor/resourcedetection Resource detection processor label Aug 20, 2024
@rogercoll
Copy link
Contributor

Crash the container when this happens so that it can get restarted automatically due to the restart policy

Maybe a new error_mode configuration option like the transform processor would help, when set to propagate it should return any detector error and crash the collector.

@rogercoll
Copy link
Contributor

Another viable solution would be what is being proposed here: #34876

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 11, 2024
@litetex
Copy link
Author

litetex commented Dec 14, 2024

explaining why it is still relevant

It's still not fixed so. This should be reason enough.

@github-actions github-actions bot removed the Stale label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/resourcedetection Resource detection processor
Projects
None yet
Development

No branches or pull requests

3 participants