-
Notifications
You must be signed in to change notification settings - Fork 139
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Zhang, Chaoyue (Jack) <[email protected]>
- Loading branch information
Zhang, Chaoyue (Jack)
committed
Jun 7, 2024
1 parent
384d8c7
commit 718b7d5
Showing
21 changed files
with
854 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
<p><img src="https://www.circonus.com/wp-content/uploads/2015/03/sol-icon-itOps.png" alt="graph logo" title="graph" align="right" height="60" /></p> | ||
|
||
# Ansible Role: Nvidia GPU exporter | ||
|
||
## Description | ||
|
||
Deploy prometheus [Nvidia GPU exporter ](https://github.com/utkuozdemir/nvidia_gpu_exporter) using ansible. | ||
|
||
## Requirements | ||
|
||
- Ansible >= 2.9 (It might work on previous versions, but we cannot guarantee it) | ||
- gnu-tar on Mac deployer host (`brew install gnu-tar`) | ||
- Passlib is required when using the basic authentication feature (`pip install passlib[bcrypt]`) | ||
|
||
## Role Variables | ||
|
||
All variables which can be overridden are stored in [defaults/main.yml](defaults/main.yml) file as well as in [meta/argument_specs.yml](meta/argument_specs.yml). | ||
Please refer to the [collection docs](https://prometheus-community.github.io/ansible/branch/main/nvidia_gpu_exporter_role.html) for description and default values of the variables. | ||
|
||
## Example | ||
|
||
### Playbook | ||
|
||
Use it in a playbook as follows: | ||
|
||
```yaml | ||
- hosts: all | ||
roles: | ||
- prometheus.prometheus.nvidia_gpu_exporter | ||
``` | ||
### TLS config | ||
Before running nvidia_gpu_exporter role, the user needs to provision their own certificate and key. | ||
```yaml | ||
- hosts: all | ||
pre_tasks: | ||
- name: Create nvidia_gpu_exporter cert dir | ||
file: | ||
path: "/etc/nvidia_gpu_exporter" | ||
state: directory | ||
owner: root | ||
group: root | ||
|
||
- name: Create cert and key | ||
openssl_certificate: | ||
path: /etc/nvidia_gpu_exporter/tls.cert | ||
csr_path: /etc/nvidia_gpu_exporter/tls.csr | ||
privatekey_path: /etc/nvidia_gpu_exporter/tls.key | ||
provider: selfsigned | ||
roles: | ||
- prometheus.prometheus.nvidia_gpu_exporter | ||
vars: | ||
nvidia_gpu_exporter_tls_server_config: | ||
cert_file: /etc/nvidia_gpu_exporter/tls.cert | ||
key_file: /etc/nvidia_gpu_exporter/tls.key | ||
nvidia_gpu_exporter_basic_auth_users: | ||
randomuser: examplepassword | ||
``` | ||
### Demo site | ||
We provide an example site that demonstrates a full monitoring solution based on prometheus and grafana. The repository with code and links to running instances is [available on github](https://github.com/prometheus/demo-site) and the site is hosted on [DigitalOcean](https://digitalocean.com). | ||
## Local Testing | ||
The preferred way of locally testing the role is to use Docker and [molecule](https://github.com/ansible-community/molecule) (v3.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable for your system. Running your tests is as simple as executing `molecule test`. | ||
|
||
## Continuous Integration | ||
|
||
Combining molecule and circle CI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have quite a large test matrix which can take more time than local testing, so please be patient. | ||
|
||
## Contributing | ||
|
||
See [contributor guideline](CONTRIBUTING.md). | ||
|
||
## Troubleshooting | ||
|
||
See [troubleshooting](TROUBLESHOOTING.md). | ||
|
||
## License | ||
|
||
This project is licensed under MIT License. See [LICENSE](/LICENSE) for more details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
nvidia_gpu_exporter_version: 1.2.0 | ||
|
||
nvidia_gpu_exporter_binary_url: "https://github.com/{{ _nvidia_gpu_exporter_repo }}/releases/download/v{{ nvidia_gpu_exporter_version }}/\ | ||
nvidia_gpu_exporter_{{ nvidia_gpu_exporter_version }}.linux-{{ go_arch }}.tar.gz" | ||
nvidia_gpu_exporter_checksums_url: "https://github.com/{{ _nvidia_gpu_exporter_repo }}/releases/download/v{{ nvidia_gpu_exporter_version }}/checksums.txt" | ||
|
||
nvidia_gpu_exporter_skip_install: false | ||
|
||
nvidia_gpu_exporter_web_disable_exporter_metrics: false | ||
nvidia_gpu_exporter_web_listen_address: "0.0.0.0:9100" | ||
nvidia_gpu_exporter_web_telemetry_path: "/metrics" | ||
|
||
nvidia_gpu_exporter_binary_install_dir: "/usr/local/bin" | ||
nvidia_gpu_exporter_system_group: "nvidia-gpu-exp" | ||
nvidia_gpu_exporter_system_user: "{{ nvidia_gpu_exporter_system_group }}" | ||
|
||
# Local path to stash the archive and its extraction | ||
nvidia_gpu_exporter_archive_path: /tmp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
- name: Restart nvidia_gpu_exporter | ||
listen: "restart nvidia_gpu_exporter" | ||
become: true | ||
ansible.builtin.systemd: | ||
daemon_reload: true | ||
name: nvidia_gpu_exporter | ||
state: restarted | ||
when: | ||
- not ansible_check_mode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
# yamllint disable rule:line-length | ||
argument_specs: | ||
main: | ||
short_description: "Prometheus Nvidia GPU Exporter" | ||
description: | ||
- "Deploy prometheus L(Nvidia GPU exporter,https://github.com/utkuozdemir/nvidia_gpu_exporter) using ansible" | ||
author: | ||
- "Prometheus Community" | ||
options: | ||
nvidia_gpu_exporter_version: | ||
description: "Nvidia GPU exporter package version. Also accepts latest as parameter." | ||
default: "1.2.0" | ||
nvidia_gpu_exporter_skip_install: | ||
description: "Nvidia GPU exporter installation tasks gets skipped when set to true." | ||
type: bool | ||
default: false | ||
nvidia_gpu_exporter_binary_local_dir: | ||
description: | ||
- "Enables the use of local packages instead of those distributed on github." | ||
- "The parameter may be set to a directory where the C(nvidia_gpu_exporter) binary is stored on the host where ansible is run." | ||
- "This overrides the I(nvidia_gpu_exporter_version) parameter" | ||
nvidia_gpu_exporter_binary_url: | ||
description: "URL of the Nvidia GPU exporter binaries .tar.gz file" | ||
default: "https://github.com/{{ _nvidia_gpu_exporter_repo }}/releases/download/v{{ nvidia_gpu_exporter_version }}/nvidia_gpu_exporter-{{ nvidia_gpu_exporter_version }}.linux-{{ go_arch }}.tar.gz" | ||
nvidia_gpu_exporter_checksums_url: | ||
description: "URL of the Nvidia GPU exporter checksums file" | ||
default: "https://github.com/{{ _nvidia_gpu_exporter_repo }}/releases/download/v{{ nvidia_gpu_exporter_version }}/sha256sums.txt" | ||
nvidia_gpu_exporter_web_listen_address: | ||
description: "Address on which Nvidia GPU exporter will listen" | ||
default: "0.0.0.0:9835" | ||
nvidia_gpu_exporter_web_telemetry_path: | ||
description: "Path under which to expose metrics" | ||
default: "/metrics" | ||
nvidia_gpu_exporter_tls_server_config: | ||
description: | ||
- "Configuration for TLS authentication." | ||
- "Keys and values are the same as in L(nvidia_gpu_exporter docs,https://prometheus.io/docs/prometheus/latest/configuration/https/)." | ||
type: "dict" | ||
nvidia_gpu_exporter_http_server_config: | ||
description: | ||
- "Config for HTTP/2 support." | ||
- "Keys and values are the same as in L(nvidia_gpu_exporter docs,https://prometheus.io/docs/prometheus/latest/configuration/https/)." | ||
type: "dict" | ||
nvidia_gpu_exporter_basic_auth_users: | ||
description: "Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt." | ||
type: "dict" | ||
nvidia_gpu_exporter_binary_install_dir: | ||
description: | ||
- "I(Advanced)" | ||
- "Directory to install nvidia_gpu_exporter binary" | ||
default: "/usr/local/bin" | ||
nvidia_gpu_exporter_system_group: | ||
description: | ||
- "I(Advanced)" | ||
- "System group for Nvidia GPU exporter" | ||
default: "nvidia-gpu-exp" | ||
nvidia_gpu_exporter_system_user: | ||
description: | ||
- "I(Advanced)" | ||
- "Nvidia GPU exporter user" | ||
default: "nvidia-gpu-exp" | ||
nvidia_gpu_exporter_archive_path: | ||
description: 'Local path to stash the archive and its extraction' | ||
default: "/tmp" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
galaxy_info: | ||
author: "Prometheus Community" | ||
description: "Nvidia GPU exporter" | ||
license: "Apache" | ||
min_ansible_version: "2.9" | ||
platforms: | ||
- name: "Ubuntu" | ||
versions: | ||
- "focal" | ||
- "jammy" | ||
- name: "Debian" | ||
versions: | ||
- "bullseye" | ||
- "buster" | ||
- name: "EL" | ||
versions: | ||
- "7" | ||
- "8" | ||
- "9" | ||
- name: "Fedora" | ||
versions: | ||
- "37" | ||
- '38' | ||
galaxy_tags: | ||
- "monitoring" | ||
- "prometheus" | ||
- "exporter" | ||
- "metrics" | ||
- "system" |
18 changes: 18 additions & 0 deletions
18
roles/nvidia_gpu_exporter/molecule/alternative/molecule.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
provisioner: | ||
inventory: | ||
group_vars: | ||
all: | ||
nvidia_gpu_exporter_binary_local_dir: "/tmp/nvidia_gpu_exporter-linux-amd64" | ||
nvidia_gpu_exporter_web_listen_address: | ||
- '127.0.0.1:9835' | ||
|
||
nvidia_gpu_exporter_tls_server_config: | ||
cert_file: /etc/nvidia_gpu_exporter/tls.cert | ||
key_file: /etc/nvidia_gpu_exporter/tls.key | ||
nvidia_gpu_exporter_http_server_config: | ||
http2: true | ||
nvidia_gpu_exporter_basic_auth_users: | ||
randomuser: examplepassword | ||
go_arch: amd64 | ||
nvidia_gpu_exporter_version: 1.2.0 |
78 changes: 78 additions & 0 deletions
78
roles/nvidia_gpu_exporter/molecule/alternative/prepare.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
--- | ||
- name: Run local preparation | ||
hosts: localhost | ||
gather_facts: false | ||
tasks: | ||
- name: Download nvidia_gpu_exporter binary to local folder | ||
become: false | ||
ansible.builtin.get_url: | ||
url: "https://github.com/prometheus/nvidia_gpu_exporter/releases/download/v{{\ | ||
\ nvidia_gpu_exporter_version }}/nvidia_gpu_exporter-{{ nvidia_gpu_exporter_version }}.linux-{{\ | ||
\ go_arch }}.tar.gz" | ||
dest: "/tmp/nvidia_gpu_exporter-{{ nvidia_gpu_exporter_version }}.linux-{{ go_arch }}.tar.gz" | ||
mode: 0644 | ||
register: _download_binary | ||
until: _download_binary is succeeded | ||
retries: 5 | ||
delay: 2 | ||
check_mode: false | ||
|
||
- name: Unpack nvidia_gpu_exporter binary | ||
become: false | ||
ansible.builtin.unarchive: | ||
src: "/tmp/nvidia_gpu_exporter-{{ nvidia_gpu_exporter_version }}.linux-{{ go_arch }}.tar.gz" | ||
dest: "/tmp" | ||
creates: "/tmp/nvidia_gpu_exporter-{{ nvidia_gpu_exporter_version }}.linux-{{ go_arch\ | ||
\ }}/nvidia_gpu_exporter" | ||
check_mode: false | ||
|
||
- name: Link to nvidia_gpu_exporter binaries directory | ||
become: false | ||
ansible.builtin.file: | ||
src: "/tmp/nvidia_gpu_exporter-{{ nvidia_gpu_exporter_version }}.linux-amd64" | ||
dest: "/tmp/nvidia_gpu_exporter-linux-amd64" | ||
state: link | ||
check_mode: false | ||
|
||
- name: Install pyOpenSSL for certificate generation | ||
ansible.builtin.pip: | ||
name: "pyOpenSSL" | ||
|
||
- name: Create private key | ||
community.crypto.openssl_privatekey: | ||
path: "/tmp/tls.key" | ||
|
||
- name: Create CSR | ||
community.crypto.openssl_csr: | ||
path: "/tmp/tls.csr" | ||
privatekey_path: "/tmp/tls.key" | ||
|
||
- name: Create certificate | ||
community.crypto.x509_certificate: | ||
path: "/tmp/tls.cert" | ||
csr_path: "/tmp/tls.csr" | ||
privatekey_path: "/tmp/tls.key" | ||
provider: selfsigned | ||
|
||
- name: Run target preparation | ||
hosts: all | ||
any_errors_fatal: true | ||
tasks: | ||
- name: Create nvidia_gpu_exporter cert dir | ||
ansible.builtin.file: | ||
path: "{{ nvidia_gpu_exporter_tls_server_config.cert_file | dirname }}" | ||
state: directory | ||
owner: root | ||
group: root | ||
mode: u+rwX,g+rwX,o=rX | ||
|
||
- name: Copy cert and key | ||
ansible.builtin.copy: | ||
src: "{{ item.src }}" | ||
dest: "{{ item.dest }}" | ||
mode: "{{ item.mode | default('0644') }}" | ||
loop: | ||
- src: "/tmp/tls.cert" | ||
dest: "{{ nvidia_gpu_exporter_tls_server_config.cert_file }}" | ||
- src: "/tmp/tls.key" | ||
dest: "{{ nvidia_gpu_exporter_tls_server_config.key_file }}" |
44 changes: 44 additions & 0 deletions
44
roles/nvidia_gpu_exporter/molecule/alternative/tests/test_alternative.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
from __future__ import (absolute_import, division, print_function) | ||
__metaclass__ = type | ||
|
||
import os | ||
import testinfra.utils.ansible_runner | ||
import pytest | ||
|
||
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner( | ||
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all') | ||
|
||
|
||
def test_directories(host): | ||
dirs = [ | ||
"/var/lib/nvidia_gpu_exporter" | ||
] | ||
for dir in dirs: | ||
d = host.file(dir) | ||
assert not d.exists | ||
|
||
|
||
def test_service(host): | ||
s = host.service("nvidia_gpu_exporter") | ||
try: | ||
assert s.is_running | ||
except AssertionError: | ||
# Capture service logs | ||
journal_output = host.run('journalctl -u nvidia_gpu_exporter --since "1 hour ago"') | ||
print("\n==== journalctl -u nvidia_gpu_exporter Output ====\n") | ||
print(journal_output) | ||
print("\n============================================\n") | ||
raise # Re-raise the original assertion error | ||
|
||
|
||
def test_protecthome_property(host): | ||
s = host.service("nvidia_gpu_exporter") | ||
p = s.systemd_properties | ||
assert p.get("ProtectHome") == "yes" | ||
|
||
|
||
@pytest.mark.parametrize("sockets", [ | ||
"tcp://127.0.1.1:9835", | ||
]) | ||
def test_socket(host, sockets): | ||
assert host.socket(sockets).is_listening |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
--- | ||
provisioner: | ||
inventory: | ||
group_vars: | ||
all: | ||
nvidia_gpu_exporter_web_listen_address: "127.0.0.1:9835" |
Oops, something went wrong.