-
Notifications
You must be signed in to change notification settings - Fork 35
[guide] Profiling
Profiling a PyOP2 program is as simple as profiling any other Python code. Let's use the jacobi demo in the PyOP2 demo
folder:
python -m cProfile -o jacobi.dat jacobi.py
This will run the entire program under cProfile and write the profiling data to jacobi.dat
. Omitting -o
will print a summary to stdout, which is not very helpful in most cases.
Luckily there's a much better way of representing the profiling data using the excellent gprof2dot to generated a graph. Install from PyPI with sudo pip install gprof2dot
.
Use as follows to create a PDF:
gprof2dot -f pstats -n 1 jacobi.dat | dot -Tpdf -o jacobi.pdf
-f pstats
tells gprof2dot
that it's dealing with Python cProfile data (and not actual gprof data) and -n 1
ignores everything that makes up less than 1% of the total runtime - most likely you're not interested in that (the default is 0.5).
To aggregate the profiling data from all time steps, save the following as concat.py
:
"""Usage: concat.py PATTERN FILE"""
import sys
from glob import glob
from pstats import Stats
if len(sys.argv) != 3:
print __doc__
sys.exit(1)
files = glob(sys.argv[1])
s = Stats(files[0])
for f in files[1:]: s.add(f)
s.dump_stats(sys.argv[2])
Use it as python concat.py '<basename>.*.part' <basename>.dat
and then call gprof2dot
as before.
PyOP2 automatically times the execution of certain regions:
- sparsity building
- Plan construction
- parallel loop kernel execution
- PETSc Krylov solver
To output those timings, call profiling.summary()
in your PyOP2 programme or run with the environment variable PYOP2_PRINT_SUMMARY
set to 1.
To add additional timers to your own code, you can use the timed_region
and timed_function
helpers from pyop2.profiling
.
There are a few caveats:
-
PyOP2 delays computation, which means timing a parallel loop call will not time the execution, since the evaluation only happens when the result is requested. To disable lazy evaluation of parallel loops, set the environment variable
PYOP2_LAZY
to 0. -
Kernel execution with CUDA and OpenCL is asynchronous (though OpenCL kernels are currently launched synchronously), which means the time recorded for kernel execution is only the time for the kernel launch.