Skip to content

Latest commit

 

History

History
286 lines (215 loc) · 19.7 KB

CHANGELOG.rst

File metadata and controls

286 lines (215 loc) · 19.7 KB

XRT ChangeLog

2.13.0 (202210.2.13.x)

Added

  • Added the xrt.ini profiling flags "device_counter" and "device_trace"

Removed

  • Removed deprecated streaming APIs from OpenCL
  • xrt.ini flags "profile," "timeline_trace," and "xrt_profile" no longer load xdp profiling functionality and no longer issue deprecation warning
  • Deprecating the xrt.ini profiling flags "opencl_summary," "data_transfer_trace," and "opencl_device_counter"

2.12.0 (202120.2.12.x)

Added

xbutil/xbmgmt

  • Added xball helper script to execute a common set of utility commands (e.g., xbutil & xbmgmt) across a filtered set of devices. More information can be found using --help or in the XRT documents.
  • Auto-selecting a device if only one device exits. If the option --device or -d is specified and there is only one device installed, it will be automatically selected and used.
  • All failing operations will now return an error code. Note: An error will also be returned if there are validation failures.
  • Improved error reporting.
  • Legacy commands have been deprecated.
  • Various report output improvements.
  • xbutil configure root level command introduced. Added host_mem and p2p as commands to configure.
  • --force option support for all operations.
  • xbutil validate now supports alternative platform validation directories.

xclbin

  • PARTITION_METADATA schema updated to support pcie bars.
  • ACCEL_DEADLOCK_DETECTOR enum for Debug IP Layout.

XRT native APIs

  • xrt::aie and xrt::graph moved from experimental to mature and are now available from include/xrt/ folder.
  • Added C++ support for xrt::aie APIs.
  • Throw an exception if xrt::kernel is constructed with an AP_CTRL_NONE kernel. Use xrt::ip to control custom IPs.
  • HLS mailbox support via experimental API include/experimental/xrt_mailbox.h. See tests/xrt/mailbox for an example.
  • HLS kernel reset support using xrt::run::abort(). If a run is aborted without kernel support for sw reset, the board itself will require a reset.
  • Fixed bug in xrt::run::wait where specified timeout was ignored.
  • Added new xrt::device::get_info parameters and guaranteed format of return type with new versions of XRT.

Profiling

  • Profile summary report generated when any profiling option is enabled, no longer just when OpenCL-level profiling is enabled. All applicable summary tables and guidance will be generated based on the profiling options enabled in the xrt.ini file.
  • New data transfer summary table for aggregate information on a memory resource when monitors are added to memory resources in the design.
  • New AIE profiling metric sets to count different AIE events including (1) floating point exceptions in AIE, (2) tile execution counts, and (3) stream puts and gets.

Other changes

  • Added missing : separator in regex when matching the kernel name specified to clCreateKernel and xrt::kernel. Without the separator, matching would fail when a specified kernel name is a substring of another kernel name. The default regex is now "(kernelname):(.*)".
  • Fix register read and write in HW emulation to use the CU index ordering as rest of XRT.
  • Fix bugs related to kernel address range size (1) support custom address range size, (2) trap error when writing outside the kernel address range.
  • Support enabled for RHEL 8.4 and buntu 20.04.2 OS.
  • zocl memory manager improvements to support any sptag.

Removed

  • xclExecBufWithWaitList() API is deprecated and will be removed in future release.
  • Support is removed for RHEL/Centos 7.6 , 7.7 & Ubuntu 16.04.

2.11.0 (202110.2.11.x)

Added

  • Stable native XRT APIs (xrt::device, xrt::kernel, xrt::bo, xrt::uuid) promoted to include/xrt/ folder. See tests/xrt/22_verify/22_verify/main.cpp for an example.
  • Software emulation support has been added for native XRT APIs.
  • API errors are now propagated up the stack as std::system_error exceptions with POSIX error code.
  • Added C++ APIs for AIE graph control and execution.
  • XRT driver debug trace support through debugfs /sys/kernel/debug/xclmgmt/... and /sys/kernel/debug/xocl/...
  • Greatly improved and feature full next generation xbutil and xbmgmt utilities are now the default. The legacy version of the tools can be invoked by passing --legacy as the first switch to xbutil or xbmgmt invocation.
  • KDS scheduler in xocl has been refactored to significantly improve the throughput across hundreds of processes exercising multiple compute units across multiple devices concurrently.
  • Initial support for PS Kernel -- where a helper application running on APU on platforms like U30 and VCK5000 can be controlled from PCIe host -- has been added.
  • Initial pybind11 bindings for XRT C++ APIs. See tests/python/22_verify/22_verify.py for an example.
  • Initial multi-process support for AIE..
  • New xrt::xclbin experimental C++ API for xclbin introspection at run-time.
  • New xrt::ip experimental C++ API for register and user interrupt access of custom IPs.
  • Implemented a new profiling infrastructure with fine grained control using xrt.ini keys
  • AIE performance counters and event trace are now runtime configurable.
  • Support for tracing of native XRT API has been added.
  • Continuous trace offload performance has been improved with buffer reuse. The offload dump interval can be specified in xrt.ini.

Removed

  • XRT streaming APIs used with QDMA PCIe DMA engine have been deprecated. They will be removed in a future release.
  • xcl prefixed HAL APIs have been deprecated from python bindings. They will be removed in a future release. Users should move to xrt prefixed APIs or pybind11 based APIs.

2.10.0 (202020.2.10.x)

Added

Removed

2.9.0 (202020.2.9.x)

Added

  • Implementation of OpenCL changed to use native XRT APIs. This change can trigger detection of errors in OpenCL applications that were not previously reported. For example, if applicaton code attempts to do read-before-write from device memory, an error is now propagated to application and reported as a sync BO error.
  • Various bugfixes

Removed

2.8.0 (202020.2.8.x)

Added

  • Support for Ubuntu 20.04 and CentOS/RHEL 8.2 has been added.
  • HBM grouping support has been added which allows contiguous banks to be merged into a single group allowing for larger buffer size.
  • Support for AIE graph has been added. New AIE APIs are split into AIE array/shim level APIs in xrt_aie.h and graph level APIs in xrt_graph.h. AIE APIs are moved to libxrt_coreutil.so from libxrt_core.so.
  • pybind11 based Python wrappers have been added for native XRT C++ APIs.
  • Support for PCIe Host Memory has been added which allows user kernels to directly read/write host memory.
  • Support for data driven two stage platforms have been added.
  • Slimmed down XRT RPM/DEB package dependencies. XRT package does not depend on other dev/devel packages anymore.
  • Enabled LPDDR for edge platforms
  • xbutil for edge platforms (use xbutil --new)

Removed

  • xbsak, please use xbutil

2.7.0 (202010.2.7.x)

Added

  • Support for CentOS and RHEL 7.7, 7,8, and 8.1.
  • All OS versions now use Python3.
  • Native XRT APIs under $XILINX_XRT/include/experimental are subject to change without warning.

Removed

  • Removed all references to python2.
  • Removed automatic installation of PyOpenCL.

2.6.0 (202010.2.6)

Added

  • XRT native APIs for PL kernel have been added. These APIs are defined in new header file xrt_kernel.h. Please see tests/xrt/22_verify/main.cpp and tests/xrt/02_simple/main.cpp for examples. The APIs are also accessible from python. Please see tests/python/22_verify/22_verify.py and tests/python/02_simple/main.py for examples.
  • Support for data-driven platforms have been added. XRT uses PCIe VSEC to identify data-driven platforms. For these class of platforms XRT uses device tree to discover IPs in the shell and then initialize them.
  • Experimental APIs have been added for AIE control for edge platforms. The APIs are defined in header file xrt_aie.h.
  • Support for U30 video acceleration offload device has been added.
  • Early access versions of next generation utilities, xbutil and xbmgmt are available. They can be invoked via --new switch as xbutil --new.
  • Utilties xbutil and xbmgmt now give a warning when they detect an unsupported Linux distribution version and kernel version.
  • Error code paths for clPollStreams() API has been improved.

Removed

  • Deprecated utilties xclbincat and xclbinsplit have been removed. Please use xclbinutil to work with xclbin files.
  • xclResetDevice() has been marked as deprecated in this release and will be removed in a future release. Please use xbutil reset to reset device.
  • xclUpgradeFirmware(), xclUpgradeFirmware2() and xclUpgradeFirmwareXSpi() have been marked as deprecated in this release and will be removed in a future release. Please use xbmgmt utility to flash device.
  • xclBootFPGA(), xclRemoveAndScanFPGA() and xclRegisterInterruptNotify() have been marked as deprecated in this release and will be removed in a future release. These functionalities are no longer supported.
  • xclLockDevice() and xclUnlockDevice() have been marked as deprecated in this release and will be removed in a future release. These functionalities are no longer supported.
  • This is the last release of XMA legacy APIs. Please port your application to XMA2 APIs.

Known Issues

  • On CentOS the xrtdeps.sh script used to install required dependencies for building XRT is trying to install no longer supported devtoolset-6. In order to build XRT on CentOS or RHEL, a later devtoolset version should be installed, for example devtoolset-9.

2.4.0 (202010.2.4)

Added

  • xclUnmapBO() was added to match xclMapBO(). This new API should be called when unmapping addresses returned by xclMapB(). On Linux the API ends up calling POSIX munmap() but on Windows the implementation is different.

2.3.0 (201920.2.3)

Added

  • xclRead() and xclWrite() have been marked as deprecated in this release and will be removed in a future release. For direct register access please use replacement APIs xclRegRead() and xclRegWrite() which are more secure and multi-process aware.
  • Edge platforms can now use DFX also known as Partial Reconfiguration.
  • Support for U50 board has been added to XRT.
  • Support for signing xclbins using xclbinutil and validating xclbin signature in xclbin driver has been added to XRT. Please refer to XRT Security documentation https://xilinx.github.io/XRT/2019.2/html/security.html for more details.
  • Edge platforms based on MPSoC now support M2M feature via Zynqmp built-in DMA engine. M2M for both PCIe and edge platforms can be performed using xclCopyBO() XRT API or clEnqueueCopyBuffers() OCL API. Note that the same APIs can also be used to copy buffers between two devices using PCIe peer-to-peer transfer.
  • For edge platforms XRT now supports ACC (adapter execution model).
  • XRT documentation has been reorganized and significantly updated.
  • XRT now natively supports fully virtualized environments where management physical function (PF0) is completely hidden in host and only user physical function (PF1) is exported to the guest. End-user applications based on libxrt_core and xbutil command line utility do not need directly interact with xclmgmt driver. Communication between xocl driver and xclmgmt driver is done over hardware mailbox and MPD/MSD framework. For more information refer to MPD/MSD and Mailbox sections in XRT documentation.
  • Management Physical Function (PF0) should now be managed using xbmgmt utility which is geared towards system adminstrators. xbutil continues to be end-user facing utility.
  • Support has been added for device memory only buffer with no backing shadow buffer in host on PCIe platforms. To allocate such buffers use XCL_BO_FLAGS_DEV_ONLY in flags field of xclAllocBO() or CL_MEM_HOST_NO_ACCESS in flags field of OCL API.
  • XRT now has integrated support for Linux hwmon. Run Linux sensors utility to see all the sensor values exported by Alveo/XRT.
  • XRT now has production support for edge platforms. The following non DFX platforms edge platforms are supported: zcu102_base, zcu104_base, zc702, zc706. In addition zcu102_base_dfx platform has DFX support.
  • Emulation and HW profiling support has been enabled for all the above mentioned edge platforms. Zynq MPSoC platforms: zcu102_base, zcu104_base and zcu102_base_dfx also has emulation profiling enabled.
  • Improved handling of PCIe reset via xbutil reset which resolves system crash observed on some servers.
  • Resource management has been moved out of XMA library.
  • Only signed xclbins can be loaded on systems running in UEFI secure boot mode. You can use DKMS key used to sign XRT drivers to sign xclbins as well. As root please use the following command to sign xclbin with DKMS UEFI key-- xclbinutil --private-key /var/lib/shim-signed/mok/MOK.priv --certificate /var/lib/shim-signed/mok/MOK.der --input a.xclbin --output signed.xclbin

Known Issue

  • On U280 Platform, downloading XCLBIN is going to reset P2P BAR size back to 256M internally. XRT workaround this issue by reading BAR size register and writing back the same value. This sets the P2P BAR size back to the value before downloading XCLBIN.
  • On edge platforms intermittent hang is observed when downloading different xclbins multiple times while CU interrupt is enabled.
  • Dynamic clock scaling is not enabled for edge platforms.
  • On PPC64LE xbutil reset uses PCIe fundamental reset effectively reloading the platform from PROM. Note on x86_64 xbutil reset continues to use PCIe warm reset which just resets the shell and the dynamic region without reloading the platform from PROM.

2.2.0 (201910.2.2)

Added

  • Production support for QDMA (Xilinx PCIe Streaming DMA) engine has been added to XRT. Applications can use Xilinx streaming extension APIs defined in cl_ext_xilinx.h to work with streams on QDMA platforms like xilinx_u200_qdma_201910_1. Look for examples on https://github.com/Xilinx/SDAccel_Examples.
  • PCIe peer-to-peer functionality is fully supported. Please consult https://xilinx.github.io/XRT/2019.1/html/p2p.html for details on how to setup PCIe peer-to-peer BAR and host system requirements. P2P buffers are created by passing XCL_MEM_EXT_P2P_BUFFER flag to clCreateBuffer() API. Peer PCIe devices like NVMe can directly DMA from/to P2P buffers. P2P transfers between two Alveo™ boards can be triggered through standard clEnqueueCopyBuffers() API.
  • Support has been added for AP_CTRL_CHAIN (data-flow) and AP_CTRL_NONE (streaming) execution models. XRT scheduler (including hardware accelerated ERT) have been updated to handle the new execution models. xclbin tools have been updated to annotate xclbin IP_LAYOUT entries with suitable tags to pass the execution model information to XRT.
  • Memory to memory (M2M) hardware accelerated transfers from one DDR bank to another within a device can be effected on platforms with M2M IP via standard clEnqueueCopyBuffer()
  • XRT now looks for xrt.ini configuration file and if not found looks for legacy sdaccel.ini configuration file. If not found in usual search directories the files are now also searched in working directory.
  • Embedded platforms based on Zynq MPSoC US+™ are fully supported. For reference designs please explore reVISION™ stack from Xilinx. Embedded platforms now use interrupts for CU completion notification, significantly reducing ARM CPU usage.
  • Profiling support has been extended to embedded platforms with timeline trace and profile summary.
  • XRT now makes no assumption about CU base addresses on embedded platforms. CU base addresses can be completely floating and are discovered from IP_LAYOUT section of xclbin.
  • XMA (Xilinx Media Accelerator) is now fully integrated into XRT by using the common config reader and messaging framework (also shared by OCL) provided by XRT core.
  • XMA uses XRT core framework for scheduling tasks on encoder/decoder/scaler. New XMA APIs provide a method to prepare register write command packet, send the write command to XRT and then wait for completion of one or more command submissions. Please look at https://github.com/Xilinx/xma-samples for recommended way to write XMA plugins and design video IP control interface.
  • Multiple process mode is on by default in this release. This means multiple user processes can simultaneously use the same CU on a board. XRT does time division multiplexing. Note there is no support for pre-emption. In multi-process run only the first process gets profiling support.
  • OCL can perform automatic binding of cl_mem to DDR bank by using several heuristics like kernel argument index and kernel instance information. The API clCreateKernel is enhanced to accept annotated CU name(s) to fetch asymmetrical compute units (If all the CUs of a kernel have exact same port maps or port connections they are symmetrical compute units, otherwise CUs are asymmetrical) and streaming compute units.
  • XRT will give error if it cannot identify the buffer location (in earlier releases it used to assume a default location). Remedies: a) Check kernel XCLBIN to make sure kernel argument corresponding to the buffer is mapped to device memory properly b) Use clSetKernelArg before any enqueue operation on buffer
  • Host applications directly linking with libxilinxopencl.so must use -Wl,-rpath-link,$(XILINX_XRT)/lib in the linker line. Host applications linking with ICD loader, libOpenCL.so do not need to change.
  • xbutil top now reports live CU usage metric.
  • xclbincat and xclbinsplit are deprecated by xclbinutil. These deprecated tools are currently scheduled to be obsoleted in the next release.
  • Profiling subsystem has been enhanced to show dataflow, PCIe peer to peer transfers, M2M transfers and kernel to kernel streaming information.
  • XRT has switched to new header file xrt.h in place of xclhal2.h. The latter is still around for backwards compatibility but hash includes xrt.h for all definitions. A new file xrt-next.h has been added for experimental features.

2.1.0 (201830.2.1)

Added

  • xbutil can now generate output in JSON format for easy parsing by other tools. Use xbutil dump to generate JSON output on stdout.
  • Initial support for PCIe peer-to-peer transactions has been added. Please consult https://xilinx.github.io/XRT/2018.3/html/p2p.html for details.
  • 64-bit BARs in Alveo shells are natively supported.
  • Initial implementation of XRT logging API, xclLogMsg() for use by XRT clients.
  • Initial support for Alveo shell KDMA feature in OpenCL.
  • Yocto recipes to build XRT for embedded platforms. Please consult https://xilinx.github.io/XRT/2018.3/html/yocto.html for details.

Fixed

  • xbutil flash -a PROM corruption issue with multiple Alveo boards.
  • XRT scheduling bug with multiple boards on AWS F1 when scheduler was serializing board access.
  • xocl kernel driver bugs in handling multiple processes accessing the same device.
  • PPC64LE build failure.
  • Several core QDMA driver fixes.
  • xocl scheduler thread now yields correctly when running in polling mode.
  • Several Coverity/Fortify code scan fixes.

Deprecated

  • XMA plugin API xma_plg_register_write has been marked for deprecation. It will be removed in a future release.
  • XMA plugin API xma_plg_register_read has been marked for deprecation. It will be removed in a future release.