diff --git a/docs/data/causal-foobar.png b/docs/data/causal-foobar.png new file mode 100644 index 000000000..a887b126a Binary files /dev/null and b/docs/data/causal-foobar.png differ diff --git a/docs/how-to/instrumenting-rewriting-binary-application.rst b/docs/how-to/instrumenting-rewriting-binary-application.rst index f85d9050b..dfc1a2307 100644 --- a/docs/how-to/instrumenting-rewriting-binary-application.rst +++ b/docs/how-to/instrumenting-rewriting-binary-application.rst @@ -19,7 +19,7 @@ There are three ways to perform instrumentation with `Omnitrace ``) +* Attaching to a process that is currently running (analogous to ``gdb -p ``) * This mode is activated via ``-p `` * Same caveats as ``omnitrace-instrument`` with respect to memory and overhead @@ -514,7 +514,7 @@ were available for instrumentation, which functions were instrumented, which functions were excluded, and which functions contained overlapping function bodies. The default output path of these files will be in a ``omnitrace--output`` folder where ```` is the base name of the targeted binary or -(in the case of binary rewrite, the basename of the resulting executable), e.g. +(in the case of binary rewrite, the base name of the resulting executable), e.g. ``omnitrace-instrument -- ls`` will output its files to ``omnitrace-ls-output`` whereas ``omnitrace-instrument -o ls.inst -- ls`` will output to ``omnitrace-ls.inst-output``. diff --git a/docs/how-to/performing-causal-profiling.rst b/docs/how-to/performing-causal-profiling.rst new file mode 100644 index 000000000..6e4682ada --- /dev/null +++ b/docs/how-to/performing-causal-profiling.rst @@ -0,0 +1,618 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +**************************************************** +Performing causal profiling +**************************************************** + +The process of causal profiling can be summarized as: + +*If you speed up a given block of code by X%, the application will execute Y% faster*. + +Causal profiling directs parallel application developers to where they should focus their optimization +efforts by quantifying the potential impact of optimizations. Causal profiling is rooted in the concept +that *software execution speed is relative*: speeding up a block of code by X% is mathematically equivalent +to that block of code running at its current speed if all the other code running slower by X%. +Thus, causal profiling works by performing experiments on blocks of code during program execution which +insert pauses to slow down all other concurrently running code. During post-processing, these experiments +are translated into calculations for the potential impact of speeding up this block of code. + +Consider the following C++ code executing ``foo`` and ``bar`` concurrently in two different threads +where ``foo`` is 30% faster than ```bar``` (ideally): + +.. code-block:: cpp + + #include + #include + constexpr size_t FOO_N = 7 * 1000000000UL; + constexpr size_t BAR_N = 10 * 1000000000UL; + + void foo() + { + for(volatile size_t i = 0; i < FOO_N; ++i) {} + } + + void bar() + { + for(volatile size_t i = 0; i < BAR_N; ++i) {} + } + + int main() + { + std::thread _threads[] = { std::thread{ foo }, + std::thread{ bar } }; + + for(auto& itr : _threads) + itr.join(); + } + +No matter how many optimizations are applied to ``foo``, the application will always +require the same amount of time +because the end-to-end performance is limited by ``bar``. However, a 5% speed-up +in ``bar`` will result in the +end-to-end performance improving by 5% and this trend will continue linearly (10% speed-up +in ``bar`` yields 10% speed-up in +end-to-end performance, and so on) up to 30% speed-up, at which point, ``bar`` executes as fast as ``foo``; +any speed-up to ``bar`` beyond 30% will still only yield an end-to-end performance +speed-up of 30% since the application +will be limited by performance of ``foo``, as demonstrated below in the causal +profiling visualization: + +.. image:: ../data/causal-foobar.png + :alt: Visualization of the performance improvements for two functions with causal profiling + +The full details of the causal profiling methodology can be found in the paper +`Coz: Finding Code that Counts with Causal Profiling `_. +The author's implementation is publicly available on `GitHub `_. + +Getting started +======================================== + +To effectively use causal profiling, it is important to understand a few key +concepts, such as progress points. + +Progress points +----------------------------------- + +Causal profiling requires "progress points" to track progress through the code +in between samples. Progress points must be triggered deterministically via instrumentation. +This can happen in three different ways: + +* `Omnitrace `_ can leverage the callbacks from + Kokkos-Tools, OpenMP-Tools, roctracer, etc. and the wrappers around functions for + MPI, NUMA, RCCL, etc. to act as progress points +* Users can leverage the :doc:`runtime instrumentation capabilities <./instrumenting-rewriting-binary-application>` + to insert progress points +* Users can leverage the :doc:`User API <../reference/using-omnitrace-display-api>`, + for example ``OMNITRACE_CAUSAL_PROGRESS`` + +.. note:: + + Binary rewrite to insert progress points is not supported. When a rewritten binary + is executed, Dyninst translates the instruction pointer address in order to execute + the instrumentation. As a result, call-stack samples never return instruction + pointer addresses in the ranges defined as valid by Omnitrace. + +Key concepts +----------------------------------- + ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Concept | Setting | Options | Description | ++==================+=====================================+==================================+============================================+ +| Backend | ``OMNITRACE_CAUSAL_BACKEND`` | ``perf``, ``timer`` | Backend for recording samples required | +| | | | to calculate the virtual speed-up | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Mode | ``OMNITRACE_CAUSAL_MODE`` | ``function``, ``line`` | Select entire function or individual | +| | | | line of code for causal experiments | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| End-to-end | ``OMNITRACE_CAUSAL_END_TO_END`` | boolean | Perform a single experiment during the | +| | | | entire run (does not require | +| | | | progress-points) | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Fixed speed-up | ``OMNITRACE_CAUSAL_FIXED_SPEEDUP`` | one or more values from [0, 100] | Virtual speed-up or pool of virtual | +| | | | speed-ups to randomly select | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Binary scope | ``OMNITRACE_CAUSAL_BINARY_SCOPE`` | regular expression(s) | Dynamic binaries containing code for | +| | | | experiments | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Source scope | ``OMNITRACE_CAUSAL_SOURCE_SCOPE`` | regular expression(s) | ```` and/or ``:`` | +| | | | containing code to include in experiments | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Function scope | ``OMNITRACE_CAUSAL_FUNCTION_SCOPE`` | regular expression(s) | Restricts experiments to matching | +| | | | functions (function mode) or lines of | +| | | | code within matching functions (line mode) | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ + +.. note:: + + * Binary scope defaults to ``%MAIN%`` (executable), but the scope can be expanded to include linked libraries. + * ```` and ``:`` support requires debug info (i.e. code was compiled with ``-g`` or, preferably, ``-g3``) + * Function mode does not require debug info but does not support stripped binaries + +Backends +----------------------------------- + +Both causal profiling backends interrupt each thread 1000x per second of CPU-time to apply virtual speed-ups. +The difference between the backends is how the samples which are responsible calculating +the virtual speed-up are recorded. +There are 3 key differences between the two backends: + +* ``perf`` backend requires Linux Perf and elevated security priviledges +* ``perf`` backend interrupts the application less frequently whereas the ``timer`` backend + will interrupt the application 1000x per second of realtime +* ``timer`` backend has less accurate call-stacks due to instruction pointer skid + +In general, the ``perf`` is preferred over the ``timer`` backend when sufficient +security priviledges permit its usage. +If ``OMNITRACE_CAUSAL_BACKEND`` is set to ``auto``, Omnitrace will fallback +to using the ``timer`` backend only if +using the ``perf`` backend fails; if ``OMNITRACE_CAUSAL_BACKEND`` is +set to ``perf`` and using this backend fails, Omnitrace +will abort. + +Instruction pointer skid +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Instruction pointer (IP) skid is how many instructions execute between an event of interest +happening and where the IP is when the kernel is able to stop the application. +For the ``timer`` backend, this translates to the +difference between when the IP when the timer generated a signal and the IP when the +signal was actually generated. Although IP skid does still occur with the ``perf`` backend, +the overhead of pausing the entire thread with the ``timer`` backend makes this much more pronounced +and, as such, the ``timer`` backend tends to have a lower resolution than the ``perf`` backend, +especially in ``line`` mode. + +Installing Linux Perf +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Linux Perf is built into the kernel and may already be installed +(e.g., included in the default kernel for OpenSUSE). +The official method of checking whether Linux Perf is installed is +checking for the existence of the file +``/proc/sys/kernel/perf_event_paranoid`` -- if the file exists, the kernel has Perf installed. + +If this file does not exist, on Debian-based systems like Ubuntu, install (as superuser): + +.. code-block:: shell + + apt-get install linux-tools-common linux-tools-generic linux-tools-$(uname -r) + +and reboot your computer. In order to use the ``perf`` backend, the value +of ``/proc/sys/kernel/perf_event_paranoid`` +should be <= 2. If the value in this file is greater than 2, you will likely be +unable to use the perf backend. + +To update the paranoid level temporarily (until the system is rebooted), run +one of the following methods +as a superuser (where ``PARANOID_LEVEL=`` with ```` in the range ``[-1, 2]``): + +.. code-block:: shell + + echo ${PARANOID_LEVEL} | sudo tee /proc/sys/kernel/perf_event_paranoid + sysctl kernel.perf_event_paranoid=${PARANOID_LEVEL} + +To make the paranoid level persistent after a reboot, add ``kernel.perf_event_paranoid=`` +(where ```` is the desired paranoid level) to the ``/etc/sysctl.conf`` file. + +Speed-up prediction variability and the ``omnitrace-causal`` executable +----------------------------------------------------------------------- + +Causal profiling typically require executing the application several times in +order to adequately sample all the domains of executing code, experiment +speed-ups, etc. and resolve statistical fluctuations. +The ``omnitrace-causal`` executable is designed to simplify running this procedure: + +.. code-block:: shell + + $ omnitrace-causal --help + [omnitrace-causal] Usage: ./bin/omnitrace-causal [ --help (count: 0, dtype: bool) + --version (count: 0, dtype: bool) + --monochrome (max: 1, dtype: bool) + --debug (max: 1, dtype: bool) + --verbose (count: 1) + --config (min: 0, dtype: filepath) + --launcher (count: 1, dtype: executable) + --generate-configs (min: 0, dtype: folder) + --no-defaults (min: 0, dtype: bool) + --mode (count: 1, dtype: string) + --output-name (min: 1, dtype: filename) + --reset (max: 1, dtype: bool) + --end-to-end (max: 1, dtype: bool) + --wait (count: 1, dtype: seconds) + --duration (count: 1, dtype: seconds) + --iterations (count: 1, dtype: int) + --speedups (min: 0, dtype: integers) + --binary-scope (min: 0, dtype: integers) + --source-scope (min: 0, dtype: integers) + --function-scope (min: 0, dtype: regex-list) + --binary-exclude (min: 0, dtype: integers) + --source-exclude (min: 0, dtype: integers) + --function-exclude (min: 0, dtype: regex-list) + ] + + Causal profiling usually requires multiple runs to reliably resolve the speedup estimates. + This executable is designed to streamline that process. + For example (assume all commands end with \'-- \'): + + omnitrace-causal -n 5 -- # runs 5x with causal profiling enabled + + omnitrace-causal -s 0 5,10,15,20 # runs 2x with virtual speedups: + # - 0 + # - randomly selected from 5, 10, 15, and 20 + + omnitrace-causal -F func_A func_B func_(A|B) # runs 3x with the function scope limited to: + # 1. func_A + # 2. func_B + # 3. func_A or func_B + General tips: + - Insert progress points at hotspots in your code or use omnitrace\'s runtime instrumentation + - Note: binary rewrite will produce a incompatible new binary + - Run omnitrace-causal in "function" mode first (does not require debug info) + - Run omnitrace-causal in "line" mode when you are targeting one function (requires debug info) + - Preferably, use predictions from the "function" mode to determine which function to target + - Limit the virtual speedups to a smaller pool, e.g., 0,5,10,25,50, to get reliable predictions quicker + - Make use of the binary, source, and function scope to limit the functions/lines selected for experiments + - Note: source scope requires debug info + + + Options: + -h, -?, --help Shows this page + --version Prints the version and exit + + [DEBUG OPTIONS] + + --monochrome Disable colorized output + --debug Debug output + -v, --verbose Verbose output + + [GENERAL OPTIONS] + + -c, --config Base configuration file + -l, --launcher When running MPI jobs, omnitrace-causal needs to be *before* the executable which launches the MPI processes (i.e. + before `mpirun`, `srun`, etc.). Pass the name of the target executable (or a regex for matching to the name of the + target) for causal profiling, e.g., `omnitrace-causal -l foo -- mpirun -n 4 foo`. This ensures that the omnitrace + library is LD_PRELOADed on the proper target + -g, --generate-configs Generate config files instead of passing environment variables directly. If no arguments are provided, the config files + will be placed in ${PWD}/omnitrace-causal-config folder + --no-defaults Do not activate default features which are recommended for causal profiling. For example: PID-tagging of output files + and timestamped subdirectories are disabled by default. Kokkos tools support is added by default + (OMNITRACE_USE_KOKKOSP=ON) because, for Kokkos applications, the Kokkos-Tools callbacks are used for progress points. + Activation of OpenMP tools support is similar + + [CAUSAL PROFILING OPTIONS (General)] + (These settings will be applied to all causal profiling runs) + + -m, --mode [ function (func) | line ] + Causal profiling mode + -o, --output-name Output filename of causal profiling data w/o extension + -r, --reset Overwrite any existing experiment results during the first run + -e, --end-to-end Single causal experiment for the entire application runtime + -w, --wait Set the wait time (i.e. delay) before starting the first causal experiment (in seconds) + -d, --duration Set the length of time (in seconds) to perform causal experimentationafter the first experiment is started. Once this + amount of time has elapsed, no more causal experiments will be started but any currently running experiment will be + allowed to finish. + -n, --iterations Number of times to repeat the combination of run configurations + + [CAUSAL PROFILING OPTIONS (Combinatorial)] + (Each individual argument to these options will multiply the number runs by the number of arguments and the number of + iterations. E.g. -n 2 -B "MAIN" -F "foo" "bar" will produce 4 runs: 2 iterations x 1 binary scope x 2 function scopes + (MAIN+foo, MAIN+bar, MAIN+foo, MAIN+bar)) + + -s, --speedups Pool of virtual speedups to sample from during experimentation. Each space designates a group and multiple speedups can + be grouped together by commas, e.g. -s 0 0,10,20-50 is two groups: group #1 is \'0\' and group #2 is \'0 10 20 25 30 35 40 + 45 50\' + -B, --binary-scope Restricts causal experiments to the binaries matching the list of regular expressions. Each space designates a group + and multiple scopes can be grouped together with a semi-colon + -S, --source-scope Restricts causal experiments to the source files or source file + lineno pairs (i.e. or :) matching + the list of regular expressions. Each space designates a group and multiple scopes can be grouped together with a + semi-colon + -F, --function-scope Restricts causal experiments to the functions matching the list of regular expressions. Each space designates a group + and multiple scopes can be grouped together with a semi-colon + -BE, --binary-exclude Excludes causal experiments from being performed on the binaries matching the list of regular expressions. Each space + designates a group and multiple excludes can be grouped together with a semi-colon + -SE, --source-exclude Excludes causal experiments from being performed on the code from the source files or source file + lineno pair (i.e. + or :) matching the list of regular expressions. Each space designates a group and multiple excludes + can be grouped together with a semi-colon + -FE, --function-exclude Excludes causal experiments from being performed on the functions matching the list of regular expressions. Each space + designates a group and multiple excludes can be grouped together with a semi-colon + +Examples +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: shell + + #!/bin/bash -e + + module load omnitrace + + N=20 + I=3 + + # when providing speedups to omnitrace-causal, speedup + # groups are separated by a space so "0,10" results in + # one speedup group where omnitrace samples from + # the speedup set of {0, 10}. Passing "0 10" (without + # quotes to omnitrace-causal multiplies the + # number of runs by 2, where the first half of the + # runs instruct omnitrace to only use 0 as the + # speedup and the second half of the runs instruct + # omnitrace to only use 10 as the speedup. + SPEEDUPS="0,0,0,10,20,30,40,50,50,75,75,75,90,90,90" + # thus, -s ${SPEEDUPS} only multiplies the number + # of runs by 1 whereas -S ${SPEEDUPS_E2E} multiplies + # the number of runs by 15: + # - 3 runs with speedup of 0 + # - 1 run for each of the speedups 10, 20, 30, and 40 + # - 2 runs with speedup of 50 + # - 3 runs with speedup of 75 + # - 3 runs with speedup of 90 + SPEEDUPS_E2E=$(echo "${SPEEDUPS}" | sed \'s/,/ /g\') + + + # 20 iterations in function mode with 1 speedup group + # and source scope set to .cpp files + # + # outputs to files: + # - causal/experiments.func.coz + # - causal/experiments.func.json + # + # total executions: 20 + # + omnitrace-causal \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m function \ + -o experiments.func \ + -S ".*\\.cpp" \ + -- \ + ./causal-omni-cpu "${@}" + + + # 20 iterations in line mode with 1 speedup group + # and source scope restricted to lines 100 and 110 + # in the causal.cpp file. + # + # outputs to files: + # - causal/experiments.line.coz + # - causal/experiments.line.json + # + # total executions: 20 + # + omnitrace-causal \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m line \ + -o experiments.line \ + -S "causal\\.cpp:(100|110)" \ + -- \ + ./causal-omni-cpu "${@}" + + + # 3 iterations in function mode of 15 singular speedups + # in end-to-end mode with 2 different function scopes + # where one is restricted to "cpu_slow_func" and + # another is restricted to "cpu_fast_func". + # + # outputs to files: + # - causal/experiments.func.e2e.coz + # - causal/experiments.func.e2e.json + # + # total executions: 90 + # + omnitrace-causal \ + -n ${I} \ + -s ${SPEEDUPS_E2E} \ + -m func \ + -e \ + -o experiments.func.e2e \ + -F "cpu_slow_func" \ + "cpu_fast_func" \ + -- \ + ./causal-omni-cpu "${@}" + + # 3 iterations in line mode of 15 singular speedups + # in end-to-end mode with 2 different source scopes + # where one is restricted to line 100 in causal.cpp + # and another is restricted to line 110 in causal.cpp. + # + # outputs to files: + # - causal/experiments.line.e2e.coz + # - causal/experiments.line.e2e.json + # + # total executions: 90 + # + omnitrace-causal \ + -n ${I} \ + -s ${SPEEDUPS_E2E} \ + -m line \ + -e \ + -o experiments.line.e2e \ + -S "causal\\.cpp:100" \ + "causal\\.cpp:110" \ + -- \ + ./causal-omni-cpu "${@}" + + + export OMP_NUM_THREADS=8 + export OMP_PROC_BIND=spread + export OMP_PLACES=threads + + # set number of iterations to 5 + N=5 + + # 5 iterations in function mode of 1 speedup + # group with the source scope restricted + # to files containing "lulesh" in their filename + # and exclude functions which start with "Kokkos::" + # or "std::enable_if". + # + # outputs to files: + # - causal/experiments.func.coz + # - causal/experiments.func.json + # + # total executions: 5 + # + # First of 5 executions overwrites any + # existing causal/experiments.func.(coz|json) + # file due to "--reset" argument + # + omnitrace-causal \ + --reset \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m func \ + -o experiments.func \ + -S "lulesh.*" \ + -FE "^(Kokkos::|std::enable_if)" \ + -- \ + ./lulesh-omni -i 50 -s 200 -r 20 -b 5 -c 5 -p + + + # 5 iterations in line mode of 1 speedup + # group with the source scope restricted + # to files containing "lulesh" in their filename + # and exclude functions which start with "exec_range" + # or "execute" and which contain either + # "construct_shared_allocation" or "._omp_fn." in + # the function name. + # + # outputs to files: + # - causal/experiments.line.coz + # - causal/experiments.line.json + # + # total executions: 5 + # + # First of 5 executions overwrites any + # existing causal/experiments.line.(coz|json) + # file due to "--reset" argument + # + omnitrace-causal \ + --reset \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m line \ + -o experiments.line \ + -S "lulesh.*" \ + -FE "^(exec_range|execute);construct_shared_allocation;\\._omp_fn\\." \ + -- \ + ./lulesh-omni -i 50 -s 200 -r 20 -b 5 -c 5 -p + + + # 5 iterations in line mode of 1 speedup + # group with the source scope restricted + # to files whose basename is "lulesh.cc" + # for 3 different functions: + # - ApplyMaterialPropertiesForElems + # - CalcHourglassControlForElems + # - CalcVolumeForceForElems + # + # outputs to files: + # - causal/experiments.line.targeted.coz + # - causal/experiments.line.targeted.json + # + # total executions: 15 + # + # First of 5 executions overwrites any + # existing causal/experiments.line.(coz|json) + # file due to "--reset" argument + # + omnitrace-causal \ + --reset \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m line \ + -o experiments.line.targeted \ + -F "ApplyMaterialPropertiesForElems" \ + "CalcHourglassControlForElems" \ + "CalcVolumeForceForElems" \ + -S "lulesh\\.cc" \ + -- \ + ./lulesh-omni -i 50 -s 200 -r 20 -b 5 -c 5 -p + +Using ``omnitrace-causal`` with other launchers like ``mpirun`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``omnitrace-causal`` executable is intended to assist with application replay +and is designed to always be at the start of the command-line (i.e. the primary process). +``omnitrace-causal`` typically adds a ``LD_PRELOAD`` of the Omnitrace libraries +into the environment before launching the command in order to inject the functionality +required to start the causal profiling tooling. However, this is problematic +when the target application for causal profiling requires another command-line +tool in order to run, e.g. ``foo`` is the target application but executing ``foo`` +requires ``mpirun -n 2 foo``. If one were to simply do ``omnitrace-causal -- mpirun -n 2 foo``, +then the causal profiling would be applied to ``mpirun`` instead of ``foo``. +``omnitrace-causal`` remedies this by providing a command-line option ``-l` / `--launcher`` +to indicate the target application is using a launcher script/executable. The +argument to the command-line option is the name of (or regex for) the target application +on the command-line. When ``--launcher`` is used, ``omnitrace-causal`` will generate +all the replay configurations and execute them but delay adding the ``LD_PRELOAD``, instead it +will inject a call to itself into the command-line right before the target +application. This recursive call to itself will inherit the configuration from +parent ``omnitrace-causal`` executable, insert an ``LD_PRELOAD`` into the environment, +and then invoke an ``execv`` to replace itself with the new process launched by the target +application. + +In other words, the following command: + +.. code-block:: shell + + omnitrace-causal -l foo -n 3 -- mpirun -n 2 foo` + +Effectively results in: + +.. code-block:: shell + + mpirun -n 2 omnitrace-causal -- foo + mpirun -n 2 omnitrace-causal -- foo + mpirun -n 2 omnitrace-causal -- foo + +Visualizing the causal output +------------------------------------------------------------------------- + +Omnitrace generates a ``causal/experiments.json`` and ``causal/experiments.coz`` in +``${OMNITRACE_OUTPUT_PATH}/${OMNITRACE_OUTPUT_PREFIX}``. A standalone GUI for viewing the causal profiling +results in under development but until this is available, visit +`plasma-umass.org/coz `_ and open the ``*.coz`` file. + +Omnitrace versus Coz +======================================= + +This comparison is intended for readers who are familiar with the +`Coz profiler `_. +Omnitrace provides several additional features and utilities for causal profiling: + +.. csv-table:: + :header: "Feature", "Coz", "Omnitrace", "Notes" + :widths: 20, 60, 60, 30 + + "Debug info", "requires debug info in DWARF v3 format (``-gdwarf-3``)", "optional, supports any DWARF format version", "See Note #1 below" + "Experiment selection", "``:``", "```` or ``:``", "See Note #2 below" + "Experiment speed-ups", "Randomly samples b/t 0..100 in increments of 5 or one fixed speed-up", "Supports specifying smaller subset", "See Note #3 below" + "Scope options", "Supports binary and source scopes", "Supports binary, source, and function scopes", "See Note #4, #5, and #6 below" + "Scope inclusion", "Uses ``%`` as wildcard for binary and source scopes", "Full regex support for binary, source, and function scopes", "" + "Scope exclusion", "Not supported", "Supports regexes for excluding binary/source/function", "See Note #7 below" + "Call-stack sampling", "Linux perf", "Linux perf, libunwind", "See Note #8 below" + +.. note:: + + #. Omnitrace supports a "function" mode which does not require debug info + #. Omnitrace supports selecting entire range of instruction pointers for a function instead + of instruction pointer for one line. In large codes, "function" mode + can resolve in fewer iterations and once a target function is identified, one can + switch to line mode and limit the function scope to the target function + #. Omnitrace supports randomly sampling from subsets, e.g. { 0, 0, 5, 10 } + where 0% is randomly selected 50% of time and 5% and 10% are randomly selected 25% of the time + #. Omnitrace and COZ have same definition for binary scope: the binaries + loaded at runtime (e.g. executable and linked libraries) + #. Omnitrace "source scope" supports both ```` and ``:`` formats + in contrast to COZ "source scope" which requires ``:`` format + #. Omnitrace supports a "function" scope which narrows the functions/lines + which are eligible for causal experiments to those within the matching functions + #. Omnitrace supports a second filter on scopes for removing binary/source/function + caught by inclusive match, e.g. ``BINARY_SCOPE=.*`` + ``BINARY_EXCLUDE=libmpi.*`` + initially includes all binaries but exclude regex removes MPI libraries + #. In Omnitrace, the Linux perf backend is preferred over use libunwind. However, + Linux perf usage can be restricted for security reasons. + Omnitrace will fallback to using a second POSIX timer and libunwind if + Linux perf is not available. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index cc199b48d..6dae2c560 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -24,7 +24,9 @@ subtrees: - file: how-to/sampling-call-stack.rst title: Sampling the call stack - file: how-to/instrumenting-rewriting-binary-application.rst - title: Instrumenting and rewriting a binary application + title: Instrumenting and rewriting a binary application + - file: how-to/performing-causal-profiling.rst + title: Performing causal profiling - caption: Conceptual entries: