NOTE: This is a new experimental feature. Please report any issues you encounter.
DD supports profiling of subgraph execution using AI Visualization tool. Following steps need to be followed to get op level performance breakdown. This feature in DD uses underlying infrastructure from XRT, NPU Driver and AI Visualizer.
Profiling can be enabled by setting env variable DD_ENABLE_PROFILE
. This environment variable can be set to profile at different levels.
- Subgraph level profiling :
set DD_ENABLE_PROFILE=1
- PDI partition level profiling :
set DD_ENABLE_PROFILE=2
- All ops :
set DD_ENABLE_PROFILE=3
Profile level 3 is the highest profiling level supported.
This feature has been validated with this driver.
Follow instructions here to add XDP kernel to xclbin. Without this profiling will not work. Search for Add XDP_KERNEL
in the confluence page.
Create a xrt.ini file in your working directory with the below contents
[Debug]
ml_timeline=true
This is also documented in the confluence page linked above.
Run the application from your working directory. For ex:
build\tests\cpp\matmul6\RelWithDebInfo\test_matmul6.exe "C:\temp\tsiddaga\vaip\.cache\e09099812e8c1dfd8028bbd1311dd60a\fused.onnx.subgr
aph_5.json" 100 xclbin\stx\release\4x2_psf_model_a8w8_qdq.xclbin
If everything works as expected, following two files will be generated in the working directory.
- dd_timestamp_info.json
- record_timer_ts.json
Install the wheel package from this link
Run AI visualizer for performance visualization
aianalyzer.exe <work_dir>
Work directory is the location where dd_timestamp_info.json
and record_timer_ts.json
is saved.
This will open a browser with visualization. There will be options to configure the clock frequency, please enter the clock freq of AIE based on your experiment setup.
Snapshot of visualization for reference: