Merge pull request #3458 from bobrik/ivan/no-accelerator

Remove mentions of accelerator from the docs
google · Jan 21, 2024 · 27f1e92 · 27f1e92
2 parents 786dbcf + 13df731
commit 27f1e92
Show file tree

Hide file tree

Showing 4 changed files with 14 additions and 67 deletions.
diff --git a/deploy/kubernetes/overlays/examples/gpu-privilages.yaml b/deploy/kubernetes/overlays/examples/gpu-privilages.yaml
diff --git a/docs/running.md b/docs/running.md
@@ -19,7 +19,7 @@ sudo docker run \
 
 cAdvisor is now running (in the background) on `http://localhost:8080/`. The setup includes directories with Docker state cAdvisor needs to observe.
 
-**Note**: 
+**Note**:
 - If docker daemon is running with [user namespace enabled](https://docs.docker.com/engine/reference/commandline/dockerd/#starting-the-daemon-with-user-namespaces-enabled),
 you need to add `--userns=host` option in order for cAdvisor to monitor Docker containers,
 otherwise cAdvisor can not connect to docker daemon.
@@ -122,26 +122,3 @@ cAdvisor is now running (in the foreground) on `http://localhost:8080/`.
 ## Runtime Options
 
 cAdvisor has a series of flags that can be used to configure its runtime behavior. More details can be found in runtime [options](runtime_options.md).
-
-## Hardware Accelerator Monitoring
-
-cAdvisor can export some metrics for hardware accelerators attached to containers.
-Currently only Nvidia GPUs are supported. There are no machine level metrics.
-So, metrics won't show up if no container with accelerators attached is running.
-Metrics will only show up if accelerators are explicitly attached to the container, e.g., by passing `--device /dev/nvidia0:/dev/nvidia0` flag to docker.
-If nothing is explicitly attached to the container, metrics will NOT show up. This can happen when you access accelerators from privileged containers.
-
-There are two things that cAdvisor needs to show Nvidia GPU metrics:
-- access to NVML library (`libnvidia-ml.so.1`).
-- access to the GPU devices.
-
-If you are running cAdvisor inside a container, you will need to do the following to give the container access to NVML library:
-```
--e LD_LIBRARY_PATH=<path-where-nvml-is-present>
---volume <above-path>:<above-path>
-```
-
-If you are running cAdvisor inside a container, you can do one of the following to give it access to the GPU devices:
-- Run with `--privileged`
-- If you are on docker v17.04.0-ce or above, run with `--device-cgroup-rule 'c 195:* mrw'`
-- Run with `--device /dev/nvidiactl:/dev/nvidiactl /dev/nvidia0:/dev/nvidia0 /dev/nvidia1:/dev/nvidia1 <and-so-on-for-all-nvidia-devices>`
diff --git a/docs/runtime_options.md b/docs/runtime_options.md
@@ -10,7 +10,7 @@ This document describes a set of runtime flags available in cAdvisor.
 
 * `--env_metadata_whitelist`: a comma-separated list of environment variable keys that needs to be collected for containers, only support containerd and docker runtime for now.
 
-## Limiting which containers are monitored 
+## Limiting which containers are monitored
 * `--docker_only=false` - do not report raw cgroup metrics, except the root cgroup.
 * `--raw_cgroup_prefix_whitelist` - a comma-separated list of cgroup path prefix that needs to be collected even when `--docker_only` is specified
 * `--disable_root_cgroup_stats=false` - disable collecting root Cgroup stats.
@@ -134,8 +134,8 @@ cAdvisor stores the latest historical data in memory. How long of a history it s
 --application_metrics_count_limit=100: Max number of application metrics to store (per container) (default 100)
 --collector_cert="": Collector's certificate, exposed to endpoints for certificate based authentication.
 --collector_key="": Key for the collector's certificate
---disable_metrics=<metrics>: comma-separated list of metrics to be disabled. Options are accelerator,advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp. (default advtcp,cpu_topology,cpuset,hugetlb,memory_numa,process,referenced_memory,resctrl,sched,tcp,udp)
---enable_metrics=<metrics>: comma-separated list of metrics to be enabled. If set, overrides 'disable_metrics'. Options are accelerator,advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp.
+--disable_metrics=<metrics>: comma-separated list of metrics to be disabled. Options are advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,psi_avg,psi_total,referenced_memory,resctrl,sched,tcp,udp. (default advtcp,cpu_topology,cpuset,hugetlb,memory_numa,process,referenced_memory,resctrl,sched,tcp,udp)
+--enable_metrics=<metrics>: comma-separated list of metrics to be enabled. If set, overrides 'disable_metrics'. Options are advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,psi_avg,psi_total,referenced_memory,resctrl,sched,tcp,udp.
 --prometheus_endpoint="/metrics": Endpoint to expose Prometheus metrics on (default "/metrics")
 --disable_root_cgroup_stats=false: Disable collecting root Cgroup stats
 ```
@@ -191,7 +191,7 @@ in mind that it is impossible to group more events that there are counters avail
 
 #### Getting config values
 Using perf tools:
-* Identify the event in `perf list` output. 
+* Identify the event in `perf list` output.
 * Execute command: `perf stat -I 5000 -vvv -e EVENT_NAME`
 * Find `perf_event_attr` section on `perf stat` output, copy config and type field to configuration file.
 
@@ -208,7 +208,7 @@ perf_event_attr:
   exclude_guest                    1
 ------------------------------------------------------------
 ```
-* Configuration file should look like: 
+* Configuration file should look like:
 ```json
 {
   "core": {
@@ -242,15 +242,15 @@ perf_event_attr:
 }
 ```
 
-Config values can be also obtain from: 
+Config values can be also obtain from:
 * [Intel® 64 and IA32 Architectures Performance Monitoring Events](https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia32-architectures-performance-monitoring-events.html)
 
 
 ##### Uncore Events configuration
 Uncore Event name should be in form `PMU_PREFIX/event_name` where **PMU_PREFIX** mean
 that statistics would be counted on all PMUs with that prefix in name.
 
-Let's explain this by example: 
+Let's explain this by example:
 
 ```json
 {
@@ -260,7 +260,7 @@ Let's explain this by example:
       "uncore_imc_0/cas_count_write",
       "cas_count_all"
     ],
-    "custom_events": [ 
+    "custom_events": [
       {
         "config": [
           "0x304"
@@ -419,11 +419,11 @@ See example configuration below:
 ```
 
 In the example above:
-* `instructions` will be measured as a non-grouped event and is specified using human friendly interface that can be 
-obtained by calling `perf list`. You can use any name that appears in the output of `perf list` command. This is 
+* `instructions` will be measured as a non-grouped event and is specified using human friendly interface that can be
+obtained by calling `perf list`. You can use any name that appears in the output of `perf list` command. This is
 interface that majority of users will rely on.
 * `instructions_retired` will be measured as non-grouped event and is specified using an advanced API that allows
-to specify any perf event available (some of them are not named and can't be specified with plain string). Event name 
+to specify any perf event available (some of them are not named and can't be specified with plain string). Event name
 should be a human readable string that will become a metric name.
 * `cas_count_read` will be measured as uncore non-grouped event on all Integrated Memory Controllers Performance Monitoring Units because of unset `type` field and
 `uncore_imc` prefix.
@@ -435,7 +435,7 @@ Resctrl file system is not hierarchical like cgroups, so users should set `--doc
 
 ```
 --resctrl_interval=0: Resctrl mon groups updating interval. Zero value disables updating mon groups.
-``` 
+```
 
 ## Storage driver specific instructions:
 

diff --git a/stats/types.go b/stats/types.go
@@ -22,7 +22,7 @@ import info "github.com/google/cadvisor/info/v1"
 // For each container detected by the cAdvisor manager, it will call
 // GetCollector() with the devices cgroup path for that container.
 // GetCollector() is supposed to return an object that can update
-// accelerator stats for that container.
+// external stats for that container.
 type Manager interface {
 	Destroy()
 	GetCollector(deviceCgroup string) (Collector, error)