Skip to content

Commit

Permalink
Improved md formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
causten committed Dec 11, 2024
1 parent f8d0ad9 commit deede6e
Showing 1 changed file with 66 additions and 69 deletions.
135 changes: 66 additions & 69 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,80 +5,77 @@ Full documentation for MIGraphX is available at

## MIGraphX 2.11 for ROCm 6.3.0

### Additions

* Added initial code to run on Windows
* Added support for gfx120x GPU
* Added support for FP8, and INT4
* Added ONNX operators Bitwise And, Scan,SoftmaxCrossEntropyLoss, GridSample, NegativeLogLikelihoodLoss
* Added Microsoft Contrib operators MatMulNBits, QuantizeLinear/DequantizeLinear, GroupQueryAttention, SkipSimplifiedLayerNormalization, and SimpliedLayerNormalization
* Added support for the Log2 internal operator
* Added Split-K as an optional performance improvement
* Added support for the GCC 14 compiler
* Added the ablity to call hipBlasLt libaries using environment variable MIGRAPHX_ENABLE_HIPBLASLT_GEMM=1
* Added scripts to validate ONNX models from the ONNX Model Zoo
* Enabled the OneHot operator to accept a dynamic batch parameter
* Added a --mlir flag the migraphx-driver program to offload entire module to mlir
* Added GPU Pooling Kernel
* Added examples for RNNT, and ControlNet
* Disabled requirement for MIOpen, and rocBlas when running on Windows
* Introduced fusing split-reduce with MLIR
* Allow multiple outputs for the MLIR + Pointwise fusions
### Added

* Initial code to run on Windows
* Support for gfx120x GPU
* Support for FP8, and INT4
* Support for the Log2 internal operator
* Support for the GCC 14 compiler
* The BitwiseAnd, Scan, SoftmaxCrossEntropyLoss, GridSample, and NegativeLogLikelihoodLoss ONNX operators
* The MatMulNBits, QuantizeLinear/DequantizeLinear, GroupQueryAttention, SkipSimplifiedLayerNormalization, and SimpliedLayerNormalizationMicrosoft Contrib operators
* Dymamic batch parameter support to OneHot operator
* Split-K as an optional performance improvement
* Scripts to validate ONNX models from the ONNX Model Zoo
* GPU Pooling Kernel
* --mlir flag the migraphx-driver program to offload entire module to mlir
* Fusing split-reduce with MLIR
* Multiple outputs for the MLIR + Pointwise fusions
* Pointwise fusions with MLIR across reshape operations
* Added reduce_any and reduce_all options from the Reduce operation via Torch MIGraphX
* Added a flag to dump mlir modules to mxrs
* Added MIGRAPHX_TRACE_BENCHMARKING=3 to print the MLIR program for improved debug output


### Optimizations

* Optimized the NHWC layout to improve performance of many convolution based models
* Improved GPU utilization
* Improved infrastructure code to enable better Kernel fusions with all supported data types
* Optmized performance for the FP8 datatype
* MIGRAPHX_MLIR_DUMP environment variable to dump MLIR modules to MXRs
* The 3 option to MIGRAPHX_TRACE_BENCHMARKING to print the MLIR program for improved debug output
* MIGRAPHX_ENABLE_HIPBLASLT_GEMM environment variable to call hipBlasLt libaries
* MIGRAPHX_VERIFY_DUMP_DIFF to improve the debugging of accuracy issues
* reduce_any and reduce_all options to the Reduce operation via Torch MIGraphX
* Examples for RNNT, and ControlNet


### Changed

* Switched to MLIR's 3D Convolution operator.
* MLIR is now used for Attention operations by default on gfx942 and newer ASICs.
* Names and locations for VRM specific libraries have changed.
* Use random mode for benchmarking GEMMs and convolutions.
* Python version is now printed with an actual version number.


### Removed

* Disabled requirements for MIOpen and rocBlas when running on Windows.
* Removed inaccuracte warning messages when using exhaustive-tune.
* Remove the hard coded path in MIGRAPHX_CXX_COMPILER allowing the compiler to be installed in different locations.

### Optimized

* Improved:
* Infrastructure code to enable better Kernel fusions with all supported data types
* Subsequent model compile time by creating a cache for already performant kernels
* Use of Attention fusion with models
* Performance of the Softmax JIT kernel and of the Pooling opterator
* Tuning operations through a new 50ms delay before running the next kernel
* Performance of several convolution based models through an optimized NHWC layout
* Performance for the FP8 datatype
* GPU utilization
* Verification tools
* Debug prints
* Documentation, including gpu-driver utility documentation
* Summary section of the migrahx-driver perf command
* Reduced model compilation time
* Improved subsequent model compile time by creating a cache for already performant kernels
* Reorder some compiler passes to allow for more fusions
* Improved the use of the Attention fusion with models
* Reordered some compiler passes to allow for more fusions
* Preloaded tiles into LDS to improve performance of pointwise transposes
* Improved all documentation
* Improved gpu-driver utility documentation
* Improved performance of the Softmax JIT kernel
* Improved performance of the Pooling opterator
* Improved debugging of accuracy issues by addig the environment variable MIGRAPHX_VERIFY_DUMP_DIFF
* Improved Tuning operations by adding a 50 ms delay before running the next kernel
* Improved the summary section of the `migrahx-driver perf` command
* Improved verification tools
* Improved debug prints
* Exposed the external_data_path property in onnx_options to set the path from onnxruntime
* Remove the hard coded path in MIGRAPHX_CXX_COMPILER allowing the compiler to be installed in different locations


### Fixes

* Fixed a bug with gfx1030 that overwrote dpp reduce
* Fixed a bug in 1arg dynamic reshape that created a failure
* Fixed a bug with dot_broadcast, and inner_broadcast that caused compile failures
* Fixed a bug where some configs were failing when using exhaustive-tune
* Fixed the ROCM Install Guide Url
* Fixed an issue while building a whl package due to an apostrophe
* Fixed the BERT Squad example requirements file to support different versions of Python
* Fixed a bug that stopped the Vicuna model from compiling
* Fixed failures with the verify option of migraphx-driver that would cause the application to exit early


### Changes

* Switched to use MLIR's 3D Convolution operator
* Enabled MLIR to be used for Attention operations by default on gfx942 and newer asics
* Adjusted name and location for VRM specific libraries
* Use random mode for benchmarking gemm and convolutions
* Changed how the python version is printed from `dev` to an actual version number


### Removals

* Removed inaccuracte warning messages when using exhaustive-tune
### Resovled Issues

Check warning on line 68 in CHANGELOG.md

View workflow job for this annotation

GitHub Actions / misspell

[misspell] CHANGELOG.md#L68

"Resovled" is a misspelling of "Resolved"
Raw output
./CHANGELOG.md:68:4: "Resovled" is a misspelling of "Resolved"

* Fixed a bug with gfx1030 that overwrote dpp_reduce.
* Fixed a bug in 1arg dynamic reshape that created a failure.
* Fixed a bug with dot_broadcast and inner_broadcast that caused compile failures.
* Fixed a bug where some configs were failing when using exhaustive-tune.
* Fixed the ROCM Install Guide URL.
* Fixed an issue while building a whl package due to an apostrophe.
* Fixed the BERT Squad example requirements file to support different versions of Python.
* Fixed a bug that stopped the Vicuna model from compiling.
* Fixed failures with the verify option of migraphx-driver that would cause the application to exit early.


## MIGraphX 2.10 for ROCm 6.2.0
Expand Down

0 comments on commit deede6e

Please sign in to comment.