- Added the xrt.ini profiling flags "device_counter" and "device_trace"
- Removed deprecated streaming APIs from OpenCL
- xrt.ini flags "profile," "timeline_trace," and "xrt_profile" no longer load xdp profiling functionality and no longer issue deprecation warning
- Deprecating the xrt.ini profiling flags "opencl_summary," "data_transfer_trace," and "opencl_device_counter"
xbutil/xbmgmt
- Added
xball
helper script to execute a common set of utility commands (e.g.,xbutil
&xbmgmt
) across a filtered set of devices. More information can be found using--help
or in the XRT documents. - Auto-selecting a device if only one device exits. If the option
--device
or-d
is specified and there is only one device installed, it will be automatically selected and used. - All failing operations will now return an error code. Note: An error will also be returned if there are validation failures.
- Improved error reporting.
- Legacy commands have been deprecated.
- Various report output improvements.
xbutil configure
root level command introduced. Addedhost_mem
andp2p
as commands to configure.--force
option support for all operations.xbutil validate
now supports alternative platform validation directories.
xclbin
PARTITION_METADATA
schema updated to support pcie bars.ACCEL_DEADLOCK_DETECTOR
enum for Debug IP Layout.
XRT native APIs
xrt::aie
andxrt::graph
moved from experimental to mature and are now available frominclude/xrt/
folder.- Added C++ support for xrt::aie APIs.
- Throw an exception if xrt::kernel is constructed with an
AP_CTRL_NONE
kernel. Usexrt::ip
to control custom IPs. - HLS mailbox support via experimental API
include/experimental/xrt_mailbox.h
. Seetests/xrt/mailbox
for an example. - HLS kernel reset support using
xrt::run::abort()
. If a run is aborted without kernel support for sw reset, the board itself will require a reset. - Fixed bug in
xrt::run::wait
where specified timeout was ignored. - Added new
xrt::device::get_info
parameters and guaranteed format of return type with new versions of XRT.
Profiling
- Profile summary report generated when any profiling option is enabled, no longer just when OpenCL-level profiling is enabled. All applicable summary tables and guidance will be generated based on the profiling options enabled in the xrt.ini file.
- New data transfer summary table for aggregate information on a memory resource when monitors are added to memory resources in the design.
- New AIE profiling metric sets to count different AIE events including (1) floating point exceptions in AIE, (2) tile execution counts, and (3) stream puts and gets.
Other changes
- Added missing
:
separator in regex when matching the kernel name specified toclCreateKernel
andxrt::kernel
. Without the separator, matching would fail when a specified kernel name is a substring of another kernel name. The default regex is now"(kernelname):(.*)"
. - Fix register read and write in HW emulation to use the CU index ordering as rest of XRT.
- Fix bugs related to kernel address range size (1) support custom address range size, (2) trap error when writing outside the kernel address range.
- Support enabled for
RHEL 8.4
andbuntu 20.04.2
OS. - zocl memory manager improvements to support any sptag.
- xclExecBufWithWaitList() API is deprecated and will be removed in future release.
- Support is removed for
RHEL/Centos 7.6
,7.7
&Ubuntu 16.04
.
- Stable native XRT APIs (xrt::device, xrt::kernel, xrt::bo, xrt::uuid) promoted to
include/xrt/
folder. Seetests/xrt/22_verify/22_verify/main.cpp
for an example. - Software emulation support has been added for native XRT APIs.
- API errors are now propagated up the stack as
std::system_error
exceptions with POSIX error code. - Added C++ APIs for AIE graph control and execution.
- XRT driver debug trace support through debugfs
/sys/kernel/debug/xclmgmt/...
and/sys/kernel/debug/xocl/...
- Greatly improved and feature full next generation xbutil and xbmgmt utilities are now the default. The legacy version of the tools can be invoked by passing --legacy as the first switch to xbutil or xbmgmt invocation.
- KDS scheduler in xocl has been refactored to significantly improve the throughput across hundreds of processes exercising multiple compute units across multiple devices concurrently.
- Initial support for PS Kernel -- where a helper application running on APU on platforms like U30 and VCK5000 can be controlled from PCIe host -- has been added.
- Initial pybind11 bindings for XRT C++ APIs. See
tests/python/22_verify/22_verify.py
for an example. - Initial multi-process support for AIE..
- New
xrt::xclbin
experimental C++ API for xclbin introspection at run-time. - New
xrt::ip
experimental C++ API for register and user interrupt access of custom IPs. - Implemented a new profiling infrastructure with fine grained control using xrt.ini keys
- AIE performance counters and event trace are now runtime configurable.
- Support for tracing of native XRT API has been added.
- Continuous trace offload performance has been improved with buffer reuse. The offload dump interval can be specified in xrt.ini.
- XRT streaming APIs used with QDMA PCIe DMA engine have been deprecated. They will be removed in a future release.
- xcl prefixed HAL APIs have been deprecated from python bindings. They will be removed in a future release. Users should move to xrt prefixed APIs or pybind11 based APIs.
- Implementation of OpenCL changed to use native XRT APIs. This change can trigger detection of errors in OpenCL applications that were not previously reported. For example, if applicaton code attempts to do read-before-write from device memory, an error is now propagated to application and reported as a sync BO error.
- Various bugfixes
- Support for Ubuntu 20.04 and CentOS/RHEL 8.2 has been added.
- HBM grouping support has been added which allows contiguous banks to be merged into a single group allowing for larger buffer size.
- Support for AIE graph has been added. New AIE APIs are split into AIE array/shim level APIs in
xrt_aie.h
and graph level APIs inxrt_graph.h
. AIE APIs are moved tolibxrt_coreutil.so
fromlibxrt_core.so
. - pybind11 based Python wrappers have been added for native XRT C++ APIs.
- Support for PCIe Host Memory has been added which allows user kernels to directly read/write host memory.
- Support for data driven two stage platforms have been added.
- Slimmed down XRT RPM/DEB package dependencies. XRT package does not depend on other dev/devel packages anymore.
- Enabled LPDDR for edge platforms
- xbutil for edge platforms (use xbutil --new)
- xbsak, please use xbutil
- Support for CentOS and RHEL 7.7, 7,8, and 8.1.
- All OS versions now use Python3.
- Native XRT APIs under $XILINX_XRT/include/experimental are subject to change without warning.
- Removed all references to python2.
- Removed automatic installation of PyOpenCL.
- XRT native APIs for PL kernel have been added. These APIs are defined in new header file
xrt_kernel.h
. Please seetests/xrt/22_verify/main.cpp
andtests/xrt/02_simple/main.cpp
for examples. The APIs are also accessible from python. Please seetests/python/22_verify/22_verify.py
andtests/python/02_simple/main.py
for examples. - Support for data-driven platforms have been added. XRT uses PCIe VSEC to identify data-driven platforms. For these class of platforms XRT uses device tree to discover IPs in the shell and then initialize them.
- Experimental APIs have been added for AIE control for edge platforms. The APIs are defined in header file
xrt_aie.h
. - Support for U30 video acceleration offload device has been added.
- Early access versions of next generation utilities, xbutil and xbmgmt are available. They can be invoked via --new switch as
xbutil --new
. - Utilties xbutil and xbmgmt now give a warning when they detect an unsupported Linux distribution version and kernel version.
- Error code paths for clPollStreams() API has been improved.
- Deprecated utilties xclbincat and xclbinsplit have been removed. Please use xclbinutil to work with xclbin files.
xclResetDevice()
has been marked as deprecated in this release and will be removed in a future release. Please use xbutil reset to reset device.xclUpgradeFirmware()
,xclUpgradeFirmware2()
andxclUpgradeFirmwareXSpi()
have been marked as deprecated in this release and will be removed in a future release. Please use xbmgmt utility to flash device.xclBootFPGA()
,xclRemoveAndScanFPGA()
andxclRegisterInterruptNotify()
have been marked as deprecated in this release and will be removed in a future release. These functionalities are no longer supported.xclLockDevice()
andxclUnlockDevice()
have been marked as deprecated in this release and will be removed in a future release. These functionalities are no longer supported.- This is the last release of XMA legacy APIs. Please port your application to XMA2 APIs.
- On CentOS the
xrtdeps.sh
script used to install required dependencies for building XRT is trying to install no longer supporteddevtoolset-6
. In order to build XRT on CentOS or RHEL, a later devtoolset version should be installed, for exampledevtoolset-9
.
xclUnmapBO()
was added to matchxclMapBO()
. This new API should be called when unmapping addresses returned byxclMapB()
. On Linux the API ends up calling POSIXmunmap()
but on Windows the implementation is different.
xclRead()
andxclWrite()
have been marked as deprecated in this release and will be removed in a future release. For direct register access please use replacement APIsxclRegRead()
andxclRegWrite()
which are more secure and multi-process aware.- Edge platforms can now use DFX also known as Partial Reconfiguration.
- Support for U50 board has been added to XRT.
- Support for signing xclbins using xclbinutil and validating xclbin signature in xclbin driver has been added to XRT. Please refer to XRT Security documentation https://xilinx.github.io/XRT/2019.2/html/security.html for more details.
- Edge platforms based on MPSoC now support M2M feature via Zynqmp built-in DMA engine. M2M for both PCIe and edge platforms can be performed using
xclCopyBO()
XRT API orclEnqueueCopyBuffers()
OCL API. Note that the same APIs can also be used to copy buffers between two devices using PCIe peer-to-peer transfer. - For edge platforms XRT now supports ACC (adapter execution model).
- XRT documentation has been reorganized and significantly updated.
- XRT now natively supports fully virtualized environments where management physical function (PF0) is completely hidden in host and only user physical function (PF1) is exported to the guest. End-user applications based on libxrt_core and xbutil command line utility do not need directly interact with xclmgmt driver. Communication between xocl driver and xclmgmt driver is done over hardware mailbox and MPD/MSD framework. For more information refer to MPD/MSD and Mailbox sections in XRT documentation.
- Management Physical Function (PF0) should now be managed using
xbmgmt
utility which is geared towards system adminstrators.xbutil
continues to be end-user facing utility. - Support has been added for device memory only buffer with no backing shadow buffer in host on PCIe platforms. To allocate such buffers use
XCL_BO_FLAGS_DEV_ONLY
in flags field of xclAllocBO() orCL_MEM_HOST_NO_ACCESS
in flags field of OCL API. - XRT now has integrated support for Linux hwmon. Run Linux sensors utility to see all the sensor values exported by Alveo/XRT.
- XRT now has production support for edge platforms. The following non DFX platforms edge platforms are supported: zcu102_base, zcu104_base, zc702, zc706. In addition zcu102_base_dfx platform has DFX support.
- Emulation and HW profiling support has been enabled for all the above mentioned edge platforms. Zynq MPSoC platforms: zcu102_base, zcu104_base and zcu102_base_dfx also has emulation profiling enabled.
- Improved handling of PCIe reset via
xbutil reset
which resolves system crash observed on some servers. - Resource management has been moved out of XMA library.
- Only signed xclbins can be loaded on systems running in UEFI secure boot mode. You can use DKMS key used to sign XRT drivers to sign xclbins as well. As root please use the following command to sign xclbin with DKMS UEFI key--
xclbinutil --private-key /var/lib/shim-signed/mok/MOK.priv --certificate /var/lib/shim-signed/mok/MOK.der --input a.xclbin --output signed.xclbin
- On U280 Platform, downloading XCLBIN is going to reset P2P BAR size back to 256M internally. XRT workaround this issue by reading BAR size register and writing back the same value. This sets the P2P BAR size back to the value before downloading XCLBIN.
- On edge platforms intermittent hang is observed when downloading different xclbins multiple times while CU interrupt is enabled.
- Dynamic clock scaling is not enabled for edge platforms.
- On PPC64LE
xbutil reset
uses PCIe fundamental reset effectively reloading the platform from PROM. Note on x86_64xbutil reset
continues to use PCIe warm reset which just resets the shell and the dynamic region without reloading the platform from PROM.
- Production support for QDMA (Xilinx PCIe Streaming DMA) engine has been added to XRT. Applications can use Xilinx streaming extension APIs defined in cl_ext_xilinx.h to work with streams on QDMA platforms like xilinx_u200_qdma_201910_1. Look for examples on https://github.com/Xilinx/SDAccel_Examples.
- PCIe peer-to-peer functionality is fully supported. Please consult https://xilinx.github.io/XRT/2019.1/html/p2p.html for details on how to setup PCIe peer-to-peer BAR and host system requirements. P2P buffers are created by passing
XCL_MEM_EXT_P2P_BUFFER
flag toclCreateBuffer()
API. Peer PCIe devices like NVMe can directly DMA from/to P2P buffers. P2P transfers between two Alveo™ boards can be triggered through standardclEnqueueCopyBuffers()
API. - Support has been added for AP_CTRL_CHAIN (data-flow) and AP_CTRL_NONE (streaming) execution models. XRT scheduler (including hardware accelerated ERT) have been updated to handle the new execution models. xclbin tools have been updated to annotate xclbin IP_LAYOUT entries with suitable tags to pass the execution model information to XRT.
- Memory to memory (M2M) hardware accelerated transfers from one DDR bank to another within a device can be effected on platforms with M2M IP via standard
clEnqueueCopyBuffer()
- XRT now looks for
xrt.ini
configuration file and if not found looks for legacy sdaccel.ini configuration file. If not found in usual search directories the files are now also searched in working directory. - Embedded platforms based on Zynq MPSoC US+™ are fully supported. For reference designs please explore reVISION™ stack from Xilinx. Embedded platforms now use interrupts for CU completion notification, significantly reducing ARM CPU usage.
- Profiling support has been extended to embedded platforms with timeline trace and profile summary.
- XRT now makes no assumption about CU base addresses on embedded platforms. CU base addresses can be completely floating and are discovered from
IP_LAYOUT
section of xclbin. - XMA (Xilinx Media Accelerator) is now fully integrated into XRT by using the common config reader and messaging framework (also shared by OCL) provided by XRT core.
- XMA uses XRT core framework for scheduling tasks on encoder/decoder/scaler. New XMA APIs provide a method to prepare register write command packet, send the write command to XRT and then wait for completion of one or more command submissions. Please look at https://github.com/Xilinx/xma-samples for recommended way to write XMA plugins and design video IP control interface.
- Multiple process mode is on by default in this release. This means multiple user processes can simultaneously use the same CU on a board. XRT does time division multiplexing. Note there is no support for pre-emption. In multi-process run only the first process gets profiling support.
- OCL can perform automatic binding of cl_mem to DDR bank by using several heuristics like kernel argument index and kernel instance information. The API
clCreateKernel
is enhanced to accept annotated CU name(s) to fetch asymmetrical compute units (If all the CUs of a kernel have exact same port maps or port connections they are symmetrical compute units, otherwise CUs are asymmetrical) and streaming compute units. - XRT will give error if it cannot identify the buffer location (in earlier releases it used to assume a default location). Remedies: a) Check kernel XCLBIN to make sure kernel argument corresponding to the buffer is mapped to device memory properly b) Use
clSetKernelArg
before any enqueue operation on buffer - Host applications directly linking with libxilinxopencl.so must use
-Wl,-rpath-link,$(XILINX_XRT)/lib
in the linker line. Host applications linking with ICD loader, libOpenCL.so do not need to change. xbutil top
now reports live CU usage metric.xclbincat
andxclbinsplit
are deprecated byxclbinutil
. These deprecated tools are currently scheduled to be obsoleted in the next release.- Profiling subsystem has been enhanced to show dataflow, PCIe peer to peer transfers, M2M transfers and kernel to kernel streaming information.
- XRT has switched to new header file
xrt.h
in place ofxclhal2.h
. The latter is still around for backwards compatibility but hash includes xrt.h for all definitions. A new filexrt-next.h
has been added for experimental features.
- xbutil can now generate output in JSON format for easy parsing by other tools. Use
xbutil dump
to generate JSON output on stdout. - Initial support for PCIe peer-to-peer transactions has been added. Please consult https://xilinx.github.io/XRT/2018.3/html/p2p.html for details.
- 64-bit BARs in Alveo shells are natively supported.
- Initial implementation of XRT logging API, xclLogMsg() for use by XRT clients.
- Initial support for Alveo shell KDMA feature in OpenCL.
- Yocto recipes to build XRT for embedded platforms. Please consult https://xilinx.github.io/XRT/2018.3/html/yocto.html for details.
xbutil flash -a
PROM corruption issue with multiple Alveo boards.- XRT scheduling bug with multiple boards on AWS F1 when scheduler was serializing board access.
- xocl kernel driver bugs in handling multiple processes accessing the same device.
- PPC64LE build failure.
- Several core QDMA driver fixes.
- xocl scheduler thread now yields correctly when running in polling mode.
- Several Coverity/Fortify code scan fixes.
- XMA plugin API xma_plg_register_write has been marked for deprecation. It will be removed in a future release.
- XMA plugin API xma_plg_register_read has been marked for deprecation. It will be removed in a future release.