Releases · amd/blis

11 Oct 03:28

sireeshasanga

5.0

34d4bba

AOCL-BLAS 5.0 Latest

Latest

AOCL-BLAS 5.0 Release Highlights

Added zen5 support
Turin specific tuning for the APIs: D/ZGEMM, DTRSM and DNRM2
AVX512 made improvements for the APIs: ZGEMV, D/ZAXPYF, D/ZDOTXF, ZDOTV, C/ZSCALV, DNRM2, S/D/ZCOPY, S/D/C/ZAXPBYV, DTRSV, DGEMMT, D/ZTRSM, and D/ZGEMM
Improvements to the AOCL_ENABLE_INSTRUCTIONS functionality
Additional APIs and Post-Ops support in addition to the improved performance for the existing APIs in aocl_gemm add-on

Assets 2

28 Feb 06:32

sireeshasanga

4.2

7c564c7

AOCL-BLAS 4.2

AOCL-BLAS 4.2 Release Highlights

Added uint8 output and zero-point support in int8 API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
Improved performance for all downscaled versions of all API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
Multithread performance improved across API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
Introduced AOCL_ENABLE_INSTRUCTIONS environment variable as an alternative to BLIS_ARCH_TYPE, but with slightly different semantics.
Improved functionality of XERBLA error handling routine in AOCL-BLAS.
Performance optimizations for the following APIs:
- DGEMM for tiny sizes
- S/ZGEMM, D/ZTRSM, ZAXPBYV, Z/ZDSCALV, S/D/ZGEMV, and D/DZNRM2
Following BLAS extension APIs have been added only for AMD “Zen” code paths:
- sgemm_pack_get_size(), sgemm_pack(), and sgemm_compute()
- dgemm_pack_get_size(), dgemm_pack(), and dgemm_compute()

Assets 2

07 Aug 15:39

sireeshasanga

4.1

a5a3c8b

AOCL-BLAS 4.1

AOCL-BLAS 4.1 Release Highlights

Additional APIs and Post-Ops support in addition to the improved performance for the existing APIs in aocl_gemm add-on
Dynamic dispatch and amdzen configuration support added to aocl_gemm add-on
Dynamic dispatch feature enhancements.
AVX 512-based optimizations for AMD “Zen4” platform:
- SGEMM, DGEMM, and ZGEMM
- DTRSM, D/ZAXPY, ZGEMV, DDOTV, and D/ZSCALV
Improved support for OpenMP nested parallelism.

Assets 2

13 Nov 07:06

pradeeptrgit

4.0

e3fc540

AOCL-BLIS 4.0

Highlights of AOCL-BLIS 4.0

The following LPGEMM (Low Precision GEMM) variants are added along with post-ops support:
- aocl_gemm_u8s8s32os32 and aocl_gemm_u8s8s32os8 routines are added and optimized using AVX-512-VNNI
- aocl_gemm_u8s8s16os16 and aocl_gemm_u8s8s16os8 routines are added and optimized using AVX2
- aocl_gemm_bf16bf16f32of32 and aocl_gemm_bf16bf16f32obf16 routines are added and optimized using AVX-512
SGEMM with packed/reorder buffer support (aocl_gemm_f32f32f32f32)
AMD “Zen4” support for BLIS
Dynamic dispatch supports AMD “Zen4” configuration
Optimizations and performance improvements for DGEMM, SGEMM, ZGEMM, DGEMMT, and DTRSM
Framework design changes

Assets 2

09 Jul 03:01

dzambare

3.2

77c8f06

AOCL-BLIS 3.2

New features:

Extended BLAS function - DZGEMM
Progress feature for xGEMM and xTRSM APIs: Time taken to complete the mathematical operations tends to increase exponentially with large input problem sizes; this feature provides users a periodic update on the operation progress.
Runtime Threading control using OpenMP APIs
Dynamic Dispatch covers APUs
Improved detection of standard x86-64 feature support
Minor bug fixes

Performance improvements in the following single-threaded and multi-threaded functions:

DGEMM, SGEMM, ZGEMM, and CGEMM
DTRSM, DGEMMT, ZTRSM, CTRSM, and DTRMM
SGEMV, DHER2, ZTRSV, and DSYMV
?AXPBYV, SSCALV, DSCALV, ?DOTXV, and ZAXPY2V

Assets 2

13 Dec 07:03

dzambare

3.1

3aa0044

AMD Optimized BLIS Version 3.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

Supports Dynamic Dispatch and AOCL Dynamic feature
Improvements in DGEMM, ZGEMM, DTRSM, DSYRK, xGEMV, and DOTV

Assets 2

06 Jul 15:43

pradeeptrgit

3.0.1

d3a65bd

AMD Optimized BLIS Version 3.0.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of DGEMM for skinny matrix shapes.
Improvements in SGEMM and ZGEMM
Improved performance of Level-1 and Level2 BLAS routines, GEMV, DOT and AXPY routines
Improvements in DTRSM for small matrix sizes

Assets 2

16 Mar 05:25

pradeeptrgit

3.0

2728fc8

AMD Optimized BLIS Version 3.0

Highlights of improvements on AMD EPYC^TM processor family CPUs

Includes support for AMD’s Zen3 architecture. Build can auto detect if it is running on zen3 and enable features and optimizations specific to zen3 architecture.
Improved performance of ?dotv, ?gemv, ?axpyv for complex and double complex datatypes
Includes support for copy transposition routines
New BLAS extension APIs added including cblas_?cabs1, cblas_i?amin, cblas_?axpby, cblas_?gemm_batch, cblas_?gemm3m
Debug trace and input logging support added for more BLIS APIs.

Assets 2

30 Jun 07:07

pradeeptrgit

2.2

32365b3

AMD Optimized BLIS Version 2.2

Highlights of improvements on AMD EPYCTM processor family CPUs

Improved performance for Level-1 BLAS routines for single and double precision.
Improved performance of SGEMV and DGEMV for large sizes.
Enabled small unpacked(SUP) GEMM kernels for single precision and double precision complex (C,Z) GEMM
Multi-threaded small unpacked(SUP) GEMM kernels enabled for (S,D,C,Z) GEMM providing improved performance for small/skinny matrices.
GEMM Selective packing feature is now multithread enabled. Selective packing feature packs either A or B or both the matrices and can be enabled by setting environment variable. Refer AOCL User Guide at https://developer.amd.com/amd-aocl/ for details
Improved TRSM single-thread and multi-thread performance for large and skinny matrices
Debug trace and log feature enabled for debug purposes.

Assets 4

14 Jan 04:35

pradeeptrgit

2.1

b3e2938

AMD Optimized BLIS Version 2.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of SGEMM and DGEMM for small and skinny size matrices
Improved TRSM single thread performance for small and skinny size matrices
BLIS build now supports both AMD "zen" and "zen2" configurations with auto config option
Support for C++ Template APIs for all BLAS functions

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: amd/blis

AOCL-BLAS 5.0

AOCL-BLAS 4.2

AOCL-BLAS 4.1

AOCL-BLIS 4.0

AOCL-BLIS 3.2

AMD Optimized BLIS Version 3.1

AMD Optimized BLIS Version 3.0.1

AMD Optimized BLIS Version 3.0

AMD Optimized BLIS Version 2.2

AMD Optimized BLIS Version 2.1