This document describes how to use the Intercept Layer for OpenCL Applications to collect and aggregate low-level performance metrics for Intel GPU OpenCL devices using the Intel Metrics Discovery API, also known as MDAPI.
MDAPI supports two methods of metrics collection: event-based sampling, where metrics are collected before and after a specific event, and time-based sampling where metrics are collected at regular time intervals.
Both sampling methods are supported by the Intercept Layer for OpenCL Applications, however support for MDAPI time-based sampling should be considered experimental. MDAPI time-based sampling results will be collected and logged to a CSV file, however additional post-processing will be required to analyze or visualize the data.
MDAPI event-based sampling is implemented as an extension to standard OpenCL
event profiling.
Effectively, an OpenCL command queue must be created that supports MDAPI event-based
sampling, similar to the way an OpenCL command queue is created that supports
event profiling. Commands that are enqueued into this command queue support a
new clGetEventProfilingInfo
query to return a buffer containing MDAPI performance
counter deltas. This buffer can then be passed to MDAPI to decode and log the
MDAPI performance counters for each event.
MDAPI event-based sampling has been supported for some time and is the most robust mechanism to collect MDAPI performance metrics, however it has some limitations:
- MDAPI event-based sampling is only available on Windows and newer Linux drivers.
- MDAPI event-based sampling is unlikely to be supported on OSX.
- The API to create an OpenCL command queue that supports MDAPI event-based sampling currently does not support newer OpenCL command queue properties such as command queue priority hints and throttle hints. OpenCL command queues that are created with these properties will not support MDAPI event-based sampling.
Additionally, because MDAPI event-based sampling relies on an extension to event profiling, event-based sampling may serialize all command execution, complicating performance analysis when commands would otherwise execute concurrently, such as in an out-of-order command queue.
To enable MDAPI Event-Based Sampling, set the following controls:
- DevicePerfCounterCustom: Specify the metrics to collect, typically "ComputeBasic".
- DevicePerfCounterEventBasedSampling: Set to
1
(enabled). - DevicePerfCounterTiming: Optionally, set to
1
(enabled) to include aggregated metrics in the report.
These controls can be enabled via cliloader
, by specifying the --mdapi-ebs
option.
MDAPI time-based sampling is implemented entirely within MDAPI itself, and relies on internal MDAPI instrumentation and buffers to collect and report performance metrics at regular time intervals.
Because MDAPI time-based sampling does not rely on any functionality in the OpenCL implementation itself, it is supported on wherever MDAPI is supported. MDAPI time-based sampling has been tested on Windows, Linux, and OSX.
MDAPI time-based sampling is not as precise as MDAPI event-based sampling, but because MDAPI time-based sampling does not rely on event profiling, MDAPI time-based sampling does not serialize command execution and can measure and profile concurrent execution via out-of-order command queues.
To enable MDAPI Time-Based Sampling, set the following controls:
- DevicePerfCounterCustom: Specify the metrics to collect, typically "ComputeBasic".
- DevicePerfCounterTimeBasedSampling: Set to
1
(enabled).
These controls can be enabled via cliloader
, by specifying the --mdapi-tbs
option.
-
On Windows, the MDAPI library is distributed with the GPU driver.
-
On Linux, the MDAPI library should be built and installed from source.
-
Linux may also requires the "metrics library". If required, it should also be built and installed from source.
-
On Linux, some GPUs (specifically the "Gen11" and "Gen12"-based GPUs, including the Arc A-series "Alchemist" discrete GPUs) require the out-of-tree i915 kernel mode driver. Please refer to the installation docs for instructions how to install the out-of-tree kernel mode driver.
-
On OSX, the path to the MDAPI library should be set manually with
DevicePerfCounterLibName
control. The library is namedlibigdmd.dylib
and it usually resides under/System/Library/Extensions/AppleIntel<CPU NAME>GraphicsMTLDriver.bundle/Contents/MacOS/libigdmd.dylib
, where<CPU NAME>
is a short name of your CPU generation. For example, on Kaby Lake machines<CPU NAME>
isKBL
. You can also add path tolibigdmd.dylib
library toDYLD_LIBRARY_PATH
environment library, so that it can be found system-wide. -
On systems with multiple GPUs, metrics may only be collected for one GPU at a time. Use the control DevicePerfCounterAdapterIndex to choose which GPU to collect metrics for. This control may also be set via
cliloader
, by passing the--mdapi-device
option. -
To enumerate the available GPUs for metric collection and their adapter indices, use
cliloader
and pass the--mdapi-devices
option. -
To enumerate available metrics, use
cliloader
and pass the--metrics
option. -
Collecting MDAPI metrics currently requires elevated privileges because metrics are collected system-wide.
-
On Linux, MDAPI metrics may be enabled for non-root users by setting
/proc/sys/dev/i915/perf_stream_paranoid
to0
:$ echo 0 > /proc/sys/dev/i915/perf_stream_paranoid
or:
$ sysctl dev.i915.perf_stream_paranoid=0
For more information, see:
-
MDAPI metrics are logged to CSV files in the usual log directory.
-
To debug MDAPI issues, consider enabling MDAPI logging by defining
MD_DEBUG
in MetricsDiscoveryHelper.cpp.
* Other names and brands may be claimed as the property of others.
Copyright (c) 2018-2024, Intel(R) Corporation