Compute input min/max with a single vectorized pass in DynamicQuantizeLinear #531

robertknight · 2025-01-10T23:26:04Z

Combine the separate passes over the input to compute the min/max in DynamicQuantizeLinear with a single vectorized pass.

There is a caveat that the new implementation doesn't guarantee the same handling of NANs in the input as before, and this will vary by architecture. The ReduceMin / ReduceMax ops always propagate NANs, whereas this implementation just uses the obvious min/max intrinsic (eg. _m256_min_ps) which may do something else.

There was already a `max` method, so this fills in a gap.

This is useful for reductions which need to compute multiple values in one pass over the data.

This allows for computing the minimum and maximum values in a slice of floats with one pass over the slice.

Previously two separate passes over the data were used to compute the min/max values. Use the `MinMax` op from rten-vecmath to compute this in one vectorized pass. In a benchmark with a quantized ModernBERT model this made DynamicQuantizeLinear 2.5-3x faster.

robertknight · 2025-01-10T23:49:13Z

For reference, ORT is using the same intrinsics I am here, or the SSE version with the same behavior in the case of x64 - https://github.com/microsoft/onnxruntime/blob/ecdeecae617d1b37b42bca51e1ade979dd260961/onnxruntime/core/mlas/lib/mlasi.h#L2230.

robertknight added 5 commits January 10, 2025 22:59

Take clippy's advice about using if let instead of match

b030964

Add SimdFloat:min method

b6eebb9

There was already a `max` method, so this fills in a gap.

Add simd_fold_array helper for vectorized ops

c546146

This is useful for reductions which need to compute multiple values in one pass over the data.

Add MinMax vectorized operation

395d57c

This allows for computing the minimum and maximum values in a slice of floats with one pass over the slice.

robertknight merged commit c6d4245 into main Jan 10, 2025
2 checks passed

robertknight deleted the dynamic-quantize-linear-min-max-simd branch January 10, 2025 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute input min/max with a single vectorized pass in DynamicQuantizeLinear #531

Compute input min/max with a single vectorized pass in DynamicQuantizeLinear #531

robertknight commented Jan 10, 2025

robertknight commented Jan 10, 2025

Compute input min/max with a single vectorized pass in DynamicQuantizeLinear #531

Compute input min/max with a single vectorized pass in DynamicQuantizeLinear #531

Conversation

robertknight commented Jan 10, 2025

robertknight commented Jan 10, 2025