nvidia-smi-exporter

Cross-platform nvidia-smi metrics prometheus exporter and grafana dashboard

Build

# Set OS to change operating system, default linux
# Set ARCH to change architecture, default amd64
make nvidia-smi-exporter-binary
# builds to -o build/VERSION/OS-ARCH/nvidia-smi-exporter

Run

# Ensure nvidia-smi is in your PATH or exporter NVIDIA_SMI with path to the binary
nvidia-smi-exporter serve

Default port is 9101

Launch at startup on Windows

Build exporter for Windows
Move exporter to C:\\Windows\System32
Create batch file in C:\\Windows\System32 containing nvidia-smi-exporter.exe serve
Hit WINDOWS_KEY + R then run shell:common startup. This should open windows explorer.
Drag the batch file into that explorer window. This should create a shortcut, not move the original.
Restart and check that metrics are scrapable.

Metrics Reported per GPU

See Dashboard section to find out about removing unsupported metrics from your dashboard.

nvidia_clocks_applications_gr
nvidia_clocks_applications_graphics
nvidia_clocks_applications_mem
nvidia_clocks_applications_memory
nvidia_clocks_current_graphics
nvidia_clocks_current_memory
nvidia_clocks_current_sm
nvidia_clocks_current_video
nvidia_clocks_default_applications_gr
nvidia_clocks_default_applications_graphics
nvidia_clocks_default_applications_mem
nvidia_clocks_default_applications_memory
nvidia_clocks_gr
nvidia_clocks_max_gr
nvidia_clocks_max_graphics
nvidia_clocks_max_mem
nvidia_clocks_max_memory
nvidia_clocks_max_sm
nvidia_clocks_mem
nvidia_clocks_sm
nvidia_clocks_throttle_reasons_gpu_idle
nvidia_clocks_throttle_reasons_hw_power_brake_slowdown
nvidia_clocks_throttle_reasons_hw_slowdown
nvidia_clocks_throttle_reasons_hw_thermal_slowdown
nvidia_clocks_throttle_reasons_sw_thermal_slowdown
nvidia_clocks_throttle_reasons_sync_boost
nvidia_clocks_video
nvidia_driver_version
nvidia_ecc_errors_corrected_aggregate_device_memory
nvidia_ecc_errors_corrected_aggregate_l1_cache
nvidia_ecc_errors_corrected_aggregate_l2_cache
nvidia_ecc_errors_corrected_aggregate_register_file
nvidia_ecc_errors_corrected_aggregate_texture_memory
nvidia_ecc_errors_corrected_aggregate_total
nvidia_ecc_errors_corrected_volatile_l1_cache
nvidia_ecc_errors_corrected_volatile_l2_cache
nvidia_ecc_errors_corrected_volatile_register_file
nvidia_ecc_errors_corrected_volatile_texture_memory
nvidia_ecc_errors_corrected_volatile_total
nvidia_ecc_errors_uncorrected_aggregate_device_memory
nvidia_ecc_errors_uncorrected_aggregate_l1_cache
nvidia_ecc_errors_uncorrected_aggregate_l2_cache
nvidia_ecc_errors_uncorrected_aggregate_register_file
nvidia_ecc_errors_uncorrected_aggregate_texture_memory
nvidia_ecc_errors_uncorrected_aggregate_total
nvidia_ecc_errors_uncorrected_volatile_device_memory
nvidia_ecc_errors_uncorrected_volatile_l1_cache
nvidia_ecc_errors_uncorrected_volatile_l2_cache
nvidia_ecc_errors_uncorrected_volatile_register_file
nvidia_ecc_errors_uncorrected_volatile_texture_memory
nvidia_ecc_errors_uncorrected_volatile_total
nvidia_encoder_stats_averageFps
nvidia_encoder_stats_averageLatency
nvidia_encoder_stats_sessionCount
nvidia_enforced_power_limit
nvidia_fan_speed
nvidia_memory_free
nvidia_memory_total
nvidia_memory_used
nvidia_power_default_limit
nvidia_power_draw
nvidia_power_limit
nvidia_power_management
nvidia_power_max_limit
nvidia_power_min_limit
nvidia_pstate
nvidia_pstate_unparseable
nvidia_query_field_unsupported
nvidia_retired_pages_dbe
nvidia_retired_pages_double_bit_count
nvidia_retired_pages_pending
nvidia_retired_pages_sbe
nvidia_retired_pages_single_bit_ecc_count
nvidia_temperature_gpu
nvidia_unknown_error
nvidia_unparseable_query_result_value
nvidia_utilization_gpu
nvidia_utilization_memory

Dashboard

The dashboard is generated using jsonnet and the grafana/grafonnet-lib library. If your GPU does not support all of the metrics, you may want to edit the dashboard.jsonnet file to remove and/or reorder some of the metric names. Then generate the dashboard using the following command:

jsonnet -J ../../grafana/grafonnet-lib ./dashboard.jsonnet > ./dashboard.json

Prometheus example config

- job_name: "nvidia_gpu"
  static_configs:
  - targets: ['HOST:9101'] # default port is 9101

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
cmd/nvidia-smi-exporter		cmd/nvidia-smi-exporter
pkg/nvidia		pkg/nvidia
.drone.yml		.drone.yml
.gitignore		.gitignore
Dockerfile.template		Dockerfile.template
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dashboard.json		dashboard.json
dashboard.jsonnet		dashboard.jsonnet
docker.Makefile		docker.Makefile
dubplate.Makefile		dubplate.Makefile
dubplate.yml		dubplate.yml
go.Makefile		go.Makefile
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nvidia-smi-exporter

Build

Run

Launch at startup on Windows

Metrics Reported per GPU

Dashboard

Prometheus example config

About

Releases

Packages

Languages

License

glynternet/nvidia-smi-exporter

Folders and files

Latest commit

History

Repository files navigation

nvidia-smi-exporter

Build

Run

Launch at startup on Windows

Metrics Reported per GPU

Dashboard

Prometheus example config

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages