Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter does not provide any of the DCGM_FI_DEV_*_UTIL metrics #430

Open
kt-pham opened this issue Dec 4, 2024 · 4 comments
Open

Exporter does not provide any of the DCGM_FI_DEV_*_UTIL metrics #430

kt-pham opened this issue Dec 4, 2024 · 4 comments
Labels
question Further information is requested

Comments

@kt-pham
Copy link

kt-pham commented Dec 4, 2024

Is there something that I need to enable or turn on in order to be able to get the following metrics?

DCGM_FI_DEV_GPU_UTIL,      gauge, GPU utilization (in %).
DCGM_FI_DEV_MEM_COPY_UTIL, gauge, Memory utilization (in %).
DCGM_FI_DEV_ENC_UTIL,      gauge, Encoder utilization (in %).
DCGM_FI_DEV_DEC_UTIL ,     gauge, Decoder utilization (in %).

I am unable to see these specific metrics from my exporter. @glowkey could you help to answer this question?

@kt-pham kt-pham added the question Further information is requested label Dec 4, 2024
@danielserrao
Copy link

I'm having the same issue. Just some more details below:

  • I see those metrics on the exporter-metrics-config-map ConfigMap, but still they don't appear when calling the /metrics path.
  • Helm Chart version: 3.6.1
  • Container image: image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.9-3.6.1-ubuntu22.04

@glowkey
Copy link
Collaborator

glowkey commented Dec 12, 2024

These metrics should be supported by all GPU models but are not supported for MIG configurations. Do you see the metrics with 'dcgmi dmon'?

@kt-pham
Copy link
Author

kt-pham commented Dec 12, 2024

is there any way for us to see usage metrics when MIG is enabled?

@glowkey
Copy link
Collaborator

glowkey commented Dec 12, 2024

Use DCGM_FI_PROF_GR_ENGINE_ACTIVE and DCGM_FI_PROF_DRAM_ACTIVE, which is reported for MIG devices. I'd encourage you to look through the DCGM_FI_PROF* family of metrics, otherwise known as DCP metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants