Microsoft Contrib Operator ToDO #3538

TedThemistokleous · 2024-10-18T02:42:40Z

TedThemistokleous
Oct 18, 2024
Collaborator

As noted in a few meetings now various quantization and optimizations when models are converted to ONNX format incorporate many of the Microsoft contrib operator set.

This is a superset of the ONNX specification: https://github.com/onnx/onnx/blob/main/docs/Operators.md

The full list of these Microsoft Contrib operators are found here: https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulIntegerToFloat

Eventually the goal is to have similar support as we do with the Onnx spec with the Microsoft operators implimented as part of MIGraphX. Luckily many of these operators are a composite of a subset of MIGraphX ops, or a variant where we can leverage existing parsing, optimizations and functionality to the codebase.

Current operators that have recently surfaced from some of our models and various quantizations we want support for sooner than later.

Please add others as you see fit or you come across these in your runs. I've formatted this as follows to help with planning

Model - Toolchain (Quark/Olive/Onnxruntime) + Model Name - Status (linked to a PR in MIGraphx, or TBD status)

MatMulIntegerToFloat - Bert Onnxrt Quant/ Quark int8 Quantized int8 Whisper - Parser changes to handle MatMulIntegerToFloat #3445
QLinearConv - Quark Quantized int8 Whisper - TBD
SparseAttention - Phi3 Small - TBD
GroupNorm - Olive Optimized Stable Diffusion 1.5, Turbo & XL (encode/decode/Unet) -Add contrib groupnorm #3678
MultiHeadAttention - Olive Optimized Stable Diffusion 1.5, Turbo and XL (Unet) - Add MultiHeadAttention #3650
NhwcConv - Olive Optimized Stable Diffusion 1.5 - TBD
GroupQuerryAttention - Onnxruntime LLama V2/Phi 3 small/mini/medium 4k/8k Context Length - Add GroupQueryAttention with KV-Cache #3425
MatMulNBits - Onnxruntime Llama V2 - Add support for MatMulNBits #3496
RotaryEmbedding - Olive Optimized Llama V2 - TBD

TedThemistokleous · 2024-10-18T04:24:33Z

TedThemistokleous
Oct 18, 2024
Collaborator Author

Took a look at the SD 1_5 Olive set of Models Unet, VAE encoder/decoder - GroupNorm is everywhere. Consistently before multiple Convolutions

UNet has MultiHeadAttention operator

1 reply

TedThemistokleous Oct 18, 2024
Collaborator Author

We have GroupNormalization in MIGraphX and GroupNorm seems to be a slight variant on that. Seems like there can be small effort to port this over based on these changes : #2242

https://github.com/onnx/onnx/blob/main/docs/Operators.md#GroupNormalization
vs
https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.GroupNorm

Similar to how we handle additional inputs with variants on standard ops (matmul,conv, matmulinteger, etc)

Onnxruntime changes to support both are trivial and posted: ROCm/onnxruntime#73

TedThemistokleous · 2024-10-18T20:41:52Z

TedThemistokleous
Oct 18, 2024
Collaborator Author

Looking at SD 3 Medium model

Unet, TextEncoders 1/2/3 - Needs layernormalization - We have this in MIGraphX just needs support in Onnxruntime ROCm/onnxruntime#73

Tokenizer 1/2 - Seeing ClipTokenizer
Tokenizer 3 - SentencepieceTokenizer
VAE Encode/Decode Use InstanceNormalization (in MIGraphX)

0 replies

TedThemistokleous · 2024-10-18T23:05:40Z

TedThemistokleous
Oct 18, 2024
Collaborator Author

SD XL

Unet - Needs MultiHeadAttention and GroupNorm
Tokenizer - LayerNormalization all supported
VAE Encode/Decode - Needs GroupNorm

0 replies

TedThemistokleous · 2024-10-23T16:46:00Z

TedThemistokleous
Oct 23, 2024
Collaborator Author

SD Turbo

ControllerNet/Unet/VAE Encode/VAE Decode - Needs GroupNorm
ControllerNet/Unet - Needs MultiLayerAttention

0 replies

TedThemistokleous · 2024-10-23T17:00:23Z

TedThemistokleous
Oct 23, 2024
Collaborator Author

Flux1, we should be able to run this without any additional modifications of the code (MIGraphx, + MIGraphX EP)

0 replies

TedThemistokleous · 2024-12-06T15:11:01Z

TedThemistokleous
Dec 6, 2024
Collaborator Author

MatMulIntegerToFloat seems to be useful by Optimized Bert that's int8 quantized by Onnxruntime using DynamicQuantizeLinear + MatMulInteger + cast to do the dequantization.

0 replies

TedThemistokleous · 2024-12-06T17:20:22Z

TedThemistokleous
Dec 6, 2024
Collaborator Author

Llama V2 requies we also have RotaryEmbedding operator

Adding to this list

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft Contrib Operator ToDO #3538

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Microsoft Contrib Operator ToDO #3538

TedThemistokleous Oct 18, 2024 Collaborator

Replies: 7 comments · 1 reply

TedThemistokleous Oct 18, 2024 Collaborator Author

TedThemistokleous Oct 18, 2024 Collaborator Author

TedThemistokleous Oct 18, 2024 Collaborator Author

TedThemistokleous Oct 18, 2024 Collaborator Author

TedThemistokleous Oct 23, 2024 Collaborator Author

TedThemistokleous Oct 23, 2024 Collaborator Author

TedThemistokleous Dec 6, 2024 Collaborator Author

TedThemistokleous Dec 6, 2024 Collaborator Author

TedThemistokleous
Oct 18, 2024
Collaborator

Replies: 7 comments 1 reply

TedThemistokleous
Oct 18, 2024
Collaborator Author

TedThemistokleous Oct 18, 2024
Collaborator Author

TedThemistokleous
Oct 18, 2024
Collaborator Author

TedThemistokleous
Oct 18, 2024
Collaborator Author

TedThemistokleous
Oct 23, 2024
Collaborator Author

TedThemistokleous
Oct 23, 2024
Collaborator Author

TedThemistokleous
Dec 6, 2024
Collaborator Author

TedThemistokleous
Dec 6, 2024
Collaborator Author