Skip to content

Releases: amd/blis

AOCL-BLAS 5.0

11 Oct 03:28
Compare
Choose a tag to compare

AOCL-BLAS 5.0 Release Highlights

  • Added zen5 support
  • Turin specific tuning for the APIs: D/ZGEMM, DTRSM and DNRM2
  • AVX512 made improvements for the APIs: ZGEMV, D/ZAXPYF, D/ZDOTXF, ZDOTV, C/ZSCALV, DNRM2, S/D/ZCOPY, S/D/C/ZAXPBYV, DTRSV, DGEMMT, D/ZTRSM, and D/ZGEMM
  • Improvements to the AOCL_ENABLE_INSTRUCTIONS functionality
  • Additional APIs and Post-Ops support in addition to the improved performance for the existing APIs in aocl_gemm add-on

AOCL-BLAS 4.2

28 Feb 06:32
Compare
Choose a tag to compare

AOCL-BLAS 4.2 Release Highlights

  • Added uint8 output and zero-point support in int8 API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
  • Improved performance for all downscaled versions of all API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
  • Multithread performance improved across API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
  • Introduced AOCL_ENABLE_INSTRUCTIONS environment variable as an alternative to BLIS_ARCH_TYPE, but with slightly different semantics.
  • Improved functionality of XERBLA error handling routine in AOCL-BLAS.
  • Performance optimizations for the following APIs:
    - DGEMM for tiny sizes
    - S/ZGEMM, D/ZTRSM, ZAXPBYV, Z/ZDSCALV, S/D/ZGEMV, and D/DZNRM2
  • Following BLAS extension APIs have been added only for AMD “Zen” code paths:
    - sgemm_pack_get_size(), sgemm_pack(), and sgemm_compute()
    - dgemm_pack_get_size(), dgemm_pack(), and dgemm_compute()

AOCL-BLAS 4.1

07 Aug 15:39
Compare
Choose a tag to compare

AOCL-BLAS 4.1 Release Highlights

  • Additional APIs and Post-Ops support in addition to the improved performance for the existing APIs in aocl_gemm add-on
  • Dynamic dispatch and amdzen configuration support added to aocl_gemm add-on
  • Dynamic dispatch feature enhancements.
  • AVX 512-based optimizations for AMD “Zen4” platform:
    - SGEMM, DGEMM, and ZGEMM
    - DTRSM, D/ZAXPY, ZGEMV, DDOTV, and D/ZSCALV
  • Improved support for OpenMP nested parallelism.

AOCL-BLIS 4.0

13 Nov 07:06
Compare
Choose a tag to compare

Highlights of AOCL-BLIS 4.0

  • The following LPGEMM (Low Precision GEMM) variants are added along with post-ops support:
    • aocl_gemm_u8s8s32os32 and aocl_gemm_u8s8s32os8 routines are added and optimized using AVX-512-VNNI
    • aocl_gemm_u8s8s16os16 and aocl_gemm_u8s8s16os8 routines are added and optimized using AVX2
    • aocl_gemm_bf16bf16f32of32 and aocl_gemm_bf16bf16f32obf16 routines are added and optimized using AVX-512
  • SGEMM with packed/reorder buffer support (aocl_gemm_f32f32f32f32)
  • AMD “Zen4” support for BLIS
  • Dynamic dispatch supports AMD “Zen4” configuration
  • Optimizations and performance improvements for DGEMM, SGEMM, ZGEMM, DGEMMT, and DTRSM
  • Framework design changes

AOCL-BLIS 3.2

09 Jul 03:01
Compare
Choose a tag to compare

New features:

  • Extended BLAS function - DZGEMM
  • Progress feature for xGEMM and xTRSM APIs: Time taken to complete the mathematical operations tends to increase exponentially with large input problem sizes; this feature provides users a periodic update on the operation progress.
  • Runtime Threading control using OpenMP APIs
  • Dynamic Dispatch covers APUs
  • Improved detection of standard x86-64 feature support
  • Minor bug fixes

Performance improvements in the following single-threaded and multi-threaded functions:

  • DGEMM, SGEMM, ZGEMM, and CGEMM
  • DTRSM, DGEMMT, ZTRSM, CTRSM, and DTRMM
  • SGEMV, DHER2, ZTRSV, and DSYMV
  • ?AXPBYV, SSCALV, DSCALV, ?DOTXV, and ZAXPY2V

AMD Optimized BLIS Version 3.1

13 Dec 07:03
Compare
Choose a tag to compare

AMD Optimized BLIS Version 3.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Supports Dynamic Dispatch and AOCL Dynamic feature
  • Improvements in DGEMM, ZGEMM, DTRSM, DSYRK, xGEMV, and DOTV

AMD Optimized BLIS Version 3.0.1

06 Jul 15:43
Compare
Choose a tag to compare

AMD Optimized BLIS Version 3.0.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of DGEMM for skinny matrix shapes.
  • Improvements in SGEMM and ZGEMM
  • Improved performance of Level-1 and Level2 BLAS routines, GEMV, DOT and AXPY routines
  • Improvements in DTRSM for small matrix sizes

AMD Optimized BLIS Version 3.0

16 Mar 05:25
2728fc8
Compare
Choose a tag to compare

AMD Optimized BLIS Version 3.0

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Includes support for AMD’s Zen3 architecture. Build can auto detect if it is running on zen3 and enable features and optimizations specific to zen3 architecture.
  • Improved performance of ?dotv, ?gemv, ?axpyv for complex and double complex datatypes
  • Includes support for copy transposition routines
  • New BLAS extension APIs added including cblas_?cabs1, cblas_i?amin, cblas_?axpby, cblas_?gemm_batch, cblas_?gemm3m
  • Debug trace and input logging support added for more BLIS APIs.

AMD Optimized BLIS Version 2.2

30 Jun 07:07
Compare
Choose a tag to compare

AMD Optimized BLIS Version 2.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance for Level-1 BLAS routines for single and double precision.
  • Improved performance of SGEMV and DGEMV for large sizes.
  • Enabled small unpacked(SUP) GEMM kernels for single precision and double precision complex (C,Z) GEMM
  • Multi-threaded small unpacked(SUP) GEMM kernels enabled for (S,D,C,Z) GEMM providing improved performance for small/skinny matrices.
  • GEMM Selective packing feature is now multithread enabled. Selective packing feature packs either A or B or both the matrices and can be enabled by setting environment variable. Refer AOCL User Guide at https://developer.amd.com/amd-aocl/ for details
  • Improved TRSM single-thread and multi-thread performance for large and skinny matrices
  • Debug trace and log feature enabled for debug purposes.

AMD Optimized BLIS Version 2.1

14 Jan 04:35
Compare
Choose a tag to compare

AMD Optimized BLIS Version 2.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of SGEMM and DGEMM for small and skinny size matrices
  • Improved TRSM single thread performance for small and skinny size matrices
  • BLIS build now supports both AMD "zen" and "zen2" configurations with auto config option
  • Support for C++ Template APIs for all BLAS functions