Cgroups #29621

atoulme · 2023-12-01T23:03:22Z

Description:
Adds process.cgroup resource attribute to process metrics

Link to tracking Issue:
Fixes #29282

crobert-1 · 2023-12-07T16:28:27Z

CI test is hitting a panic that looks like it's caused by this change:

--- FAIL: TestScrapeMetrics_Filtered (0.00s)
    --- FAIL: TestScrapeMetrics_Filtered/No_Filter (0.00s)
panic: interface conversion: processscraper.processHandle is *processscraper.processHandleMock, not *process.Process [recovered]
	panic: interface conversion: processscraper.processHandle is *processscraper.processHandleMock, not *process.Process

atoulme · 2023-12-07T16:31:11Z

Yes, I need to work on this some more. The interface used doesn't work well to access fields on the struct.

github-actions · 2023-12-22T05:19:34Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-01-17T05:19:37Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

atoulme · 2024-01-24T00:54:35Z

Sorry, this took a long time. This is now ready for review.

receiver/hostmetricsreceiver/internal/scraper/processscraper/config.go

.chloggen/cgroups.yaml

receiver/hostmetricsreceiver/internal/scraper/processscraper/metadata.yaml

receiver/hostmetricsreceiver/internal/scraper/processscraper/process.go

Co-authored-by: Andrzej Stencel <[email protected]>

…nux only

dmitryax · 2024-02-07T17:44:13Z

receiver/hostmetricsreceiver/internal/scraper/processscraper/config.go

@@ -31,6 +31,10 @@ type Config struct {
 	// the collector does not have permission for.
 	MuteProcessIOError bool `mapstructure:"mute_process_io_error,omitempty"`

+	// MuteProcessCgroupError is a flag that will mute the error encountered when trying to read the cgroup of a process
+	// the collector does not have permission for.
+	MuteProcessCgroupError bool `mapstructure:"mute_process_cgroup_error,omitempty"`


Do we need this option from the start? It's an optional metric and can be disabled if it results in errors

I think we still need an option to mute errors even for optional metrics.

The problem is (I think) that the errors in the process scraper are usually huge - reporting an error for every process on the system, and that they're repeatable - reporting the same thing on every scrape. Perhaps going forward we could do something more clever than adding another "mute" option. Here are some thoughts:

Make the error messages shorter by aggregating the duplicate error messages,

Only display a specific type of error when it happens for the first time,

Add metrics counting the occurrences of each type of error - especially important if we do 2. as the default behavior.

A solution I'd come up with for this is a new type of error that a scraper can report to log errors at a different level. This combined with option 3 would be a pretty good outcome I think; there would be metrics reported for the different types of errors, and these errors could be logged at debug level by the scraper controller.

This is the issue I have open on the collector repo with accompanying PR: open-telemetry/opentelemetry-collector#8293

**Description:** Adds `process.cgroup` resource attribute to process metrics **Link to tracking Issue:** Fixes open-telemetry#29282 --------- Co-authored-by: Andrzej Stencel <[email protected]>

cforce · 2024-04-26T15:56:26Z

Where to on / se resource attribute process.cgroup: true ?

crobert-1 · 2024-04-26T16:28:07Z

@cforce You're wondering how to enable the process.cgroup resource attribute?

The documentation here shares a little more information, you'd want add something like this to your hostmetrics receiver configuration:

resource_attributes:
  process.cgroup:
    enabled: true

wip

f32a61c

atoulme requested a review from dmitryax as a code owner December 1, 2023 23:03

atoulme requested a review from a team December 1, 2023 23:03

github-actions bot assigned dashpole Dec 1, 2023

github-actions bot added the receiver/hostmetrics label Dec 1, 2023

add cgroup functionality

df42199

atoulme force-pushed the cgroups branch from d40299b to df42199 Compare December 5, 2023 21:26

github-actions bot added the Stale label Dec 22, 2023

crobert-1 removed the Stale label Jan 2, 2024

github-actions bot added the Stale label Jan 17, 2024

github-actions bot requested a review from braydonk January 22, 2024 06:33

atoulme force-pushed the cgroups branch 2 times, most recently from 7c2f712 to ba19244 Compare January 22, 2024 22:57

github-actions bot removed the Stale label Jan 23, 2024

atoulme force-pushed the cgroups branch from ba19244 to 14dc4f7 Compare January 23, 2024 06:00

allow Pid to be wrapped for mocks

58e8ffe

atoulme force-pushed the cgroups branch from 14dc4f7 to 58e8ffe Compare January 23, 2024 06:52

andrzej-stencel reviewed Feb 2, 2024

View reviewed changes

receiver/hostmetricsreceiver/internal/scraper/processscraper/config.go Show resolved Hide resolved

andrzej-stencel reviewed Feb 2, 2024

View reviewed changes

.chloggen/cgroups.yaml Outdated Show resolved Hide resolved

andrzej-stencel reviewed Feb 2, 2024

View reviewed changes

receiver/hostmetricsreceiver/internal/scraper/processscraper/metadata.yaml Outdated Show resolved Hide resolved

andrzej-stencel reviewed Feb 2, 2024

View reviewed changes

receiver/hostmetricsreceiver/internal/scraper/processscraper/process.go Show resolved Hide resolved

atoulme and others added 4 commits February 2, 2024 08:58

Update .chloggen/cgroups.yaml

779b87c

Co-authored-by: Andrzej Stencel <[email protected]>

add NOTICE file

4b6c7b3

add to description of resource attribute process.cgroup that it is Li…

d9d0da2

…nux only

Add mute_process_cgroup_error to README

b8f99da

regenerate

e647cc4

andrzej-stencel approved these changes Feb 6, 2024

View reviewed changes

dmitryax reviewed Feb 7, 2024

View reviewed changes

dmitryax approved these changes Feb 16, 2024

View reviewed changes

dmitryax merged commit 3e385c8 into open-telemetry:main Feb 16, 2024
142 checks passed

github-actions bot added this to the next release milestone Feb 16, 2024

atoulme deleted the cgroups branch July 9, 2024 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cgroups #29621

Cgroups #29621

atoulme commented Dec 1, 2023

crobert-1 commented Dec 7, 2023

atoulme commented Dec 7, 2023

github-actions bot commented Dec 22, 2023

github-actions bot commented Jan 17, 2024

atoulme commented Jan 24, 2024

dmitryax Feb 7, 2024

andrzej-stencel Feb 8, 2024 •

edited

Loading

braydonk Feb 8, 2024

cforce commented Apr 26, 2024

crobert-1 commented Apr 26, 2024

Cgroups #29621

Cgroups #29621

Conversation

atoulme commented Dec 1, 2023

crobert-1 commented Dec 7, 2023

atoulme commented Dec 7, 2023

github-actions bot commented Dec 22, 2023

github-actions bot commented Jan 17, 2024

atoulme commented Jan 24, 2024

dmitryax Feb 7, 2024

Choose a reason for hiding this comment

andrzej-stencel Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

braydonk Feb 8, 2024

Choose a reason for hiding this comment

cforce commented Apr 26, 2024

crobert-1 commented Apr 26, 2024

andrzej-stencel Feb 8, 2024 •

edited

Loading