Minor: Document SIMD rationale and tips #6554

alamb · 2024-10-13T13:03:13Z

Which issue does this PR close?

Closes #.

Rationale for this change

@tustvold wrote up some great tips / rationale on apache/datafusion#12821 (comment) that I thought would be good to add in the docs of this repo

What changes are included in this PR?

Add documentation on the rationale for not using manual SIMD, as well as tips/tricks to get the code to properly vectorize.

See rendered version here: https://github.com/alamb/arrow-rs/blob/alamb/simd_docs/arrow/CONTRIBUTING.md

Are there any user-facing changes?

Just docs

findepi · 2024-10-13T20:30:35Z

arrow/CONTRIBUTING.md

+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.


"... on the compiler's ..." ?

(in fact, vectorization could be applied on Rust MIR level, before LLVM?)

Ill confess it is a while since i dug into rustc, but I would have thought MIR to be to high level to effectively perform auto-vectorisation which is extremely ISA specific, the best it could do would be to use LLVMs vector types, but general heiristics for doing this would be hard

I changed the docs to say "the Rust compilers auto-vectorization" as I think that is the high level description of what is going on

In this context, I think the use of llvm is an "implementation detail" (albliet an important one) about how that auto-vectorization is accomplished.

findepi · 2024-10-13T20:30:59Z

arrow/CONTRIBUTING.md

+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair


stuterred "to"

findepi · 2024-10-13T20:31:13Z

arrow/CONTRIBUTING.md

+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.


was -> turned out ?

I rephrased the sentence to hopefully be clearer now

"In fact, this crate used to contain several manual SIMD implementations, which were removed after discovering the auto-vectorized code was faster."

findepi · 2024-10-13T20:31:28Z

arrow/CONTRIBUTING.md

+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex


extra whitespace before ,

findepi · 2024-10-13T20:31:47Z

arrow/CONTRIBUTING.md

+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex
+3. No bitwise horizontal reductions or masking


is "bitwise horizontal reductions" an obvious term?

It is a class of SIMD operations, I think if people don't know to what this refers, they probably aren't the audience for this

Thanks @tustvold , i see your point.
OTOH, SIMD is widely known term and people may come to read this doc out of sheer interest how we think about simdizing the code. The term stands out from the rest of the text as less understood and https://www.google.com/search?q=bitwise+horizontal+reductions doesn't bring an obvious definition.

Perhaps we could link to https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html

Perhaps we could link to https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html

TIL: That is a nice description

I reworded this item to

No [horizontal reductions] or data dependencies

findepi · 2024-10-13T20:32:14Z

arrow/CONTRIBUTING.md

+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag)


Prefer passive voice. "SIMD instructions are enabled in the target ISA"

Changed to "Suitable SIMD instructions available in the target ISA (e.g. target-cpu RUSTFLAGS flag)"

findepi · 2024-10-13T20:32:35Z

arrow/CONTRIBUTING.md

+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with


your code -> the code

findepi · 2024-10-13T20:33:35Z

arrow/CONTRIBUTING.md

+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with
+tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and


again being sure to set RUSTFLAGS

requires to set RUSTFLAGS properly

arrow/CONTRIBUTING.md

etseidl

Thanks @alamb and @tustvold. I find this addition quite useful.

arrow/CONTRIBUTING.md

Dandandan · 2024-10-14T05:56:03Z

arrow/CONTRIBUTING.md

+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex


Suggested change

2. Not too much inlining , as the vectorizer gives up if the code is too complex

2. Not too much inlining necessary, as the vectorizer gives up if the code is too complex

I think this changes the meaning, which is that over zealous use of inline can break the vectorizer

Ah ok, the phrasing was not clear to me. Maybe use "inlining hints" then?

Changed it to be "Not too much #[inline]"

That also changes the meaning, as we have to use #[inline(never)] in various places to actively stop the compiler from inlining things

🤔

How about "not too much inlining (judicious use of #[inline] and #[inline(never)] as the vectorizer gives up if the code is too complex)

I'd move the bracket to "not too much inlining (judicious use of #[inline] and #[inline(never)]) as the vectorizer gives up if the code is too complex" but sounds good to me

in
b32679a

alamb · 2024-10-16T10:26:42Z

Starting to incorporate comments

Co-authored-by: Ed Seidl <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]>

alamb · 2024-10-16T10:46:54Z

I think it is looking pretty good now -- rendered version: https://github.com/alamb/arrow-rs/blob/alamb/simd_docs/arrow/CONTRIBUTING.md

arrow/CONTRIBUTING.md

alamb · 2024-10-17T10:45:02Z

Integration test failure

https://github.com/apache/arrow-rs/actions/runs/11366178706/job/31615971819?pr=6554

tracked by #6577

alamb · 2024-10-17T10:45:36Z

Thanks everyone -- I am happy to update this test further as well as part of some follow on PRs

Minor: Document SIMD rationale and tips

773a0b0

alamb added the documentation Improvements or additions to documentation label Oct 13, 2024

github-actions bot added the arrow Changes to the arrow crate label Oct 13, 2024

alamb mentioned this pull request Oct 13, 2024

[DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench apache/datafusion#12821

Closed

findepi approved these changes Oct 13, 2024

View reviewed changes

etseidl reviewed Oct 14, 2024

View reviewed changes

arrow/CONTRIBUTING.md Outdated Show resolved Hide resolved

arrow/CONTRIBUTING.md Outdated Show resolved Hide resolved

Dandandan reviewed Oct 14, 2024

View reviewed changes

tustvold approved these changes Oct 16, 2024

View reviewed changes

alamb and others added 4 commits October 16, 2024 06:29

Apply suggestions from code review

aefbd7f

Co-authored-by: Ed Seidl <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]>

Merge remote-tracking branch 'apache/master' into alamb/simd_docs

efe58aa

More review feedback

881d2cd

tweak

8539876

alamb commented Oct 16, 2024

View reviewed changes

arrow/CONTRIBUTING.md Outdated Show resolved Hide resolved

Update arrow/CONTRIBUTING.md

b442d52

alamb commented Oct 16, 2024

View reviewed changes

arrow/CONTRIBUTING.md Outdated Show resolved Hide resolved

Update arrow/CONTRIBUTING.md

cb5627e

alamb mentioned this pull request Oct 16, 2024

Implement GroupColumn support for StringView / ByteView (faster grouping performance) apache/datafusion#12809

Merged

alamb added 2 commits October 16, 2024 08:59

clarify inlining more

b32679a

formating

1b5b7b0

tustvold mentioned this pull request Oct 16, 2024

WASM in Browser Performance Improvements #6570

Closed

alamb merged commit 9485897 into apache:master Oct 17, 2024
22 of 23 checks passed

alamb deleted the alamb/simd_docs branch October 17, 2024 10:45

alamb mentioned this pull request Nov 16, 2024

Archery Integration Test with c# failing on main #6577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor: Document SIMD rationale and tips #6554

Minor: Document SIMD rationale and tips #6554

alamb commented Oct 13, 2024 •

edited

Loading

findepi Oct 13, 2024

tustvold Oct 13, 2024 •

edited

Loading

alamb Oct 16, 2024

findepi Oct 13, 2024

alamb Oct 16, 2024

findepi Oct 13, 2024

alamb Oct 16, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

tustvold Oct 13, 2024

findepi Oct 14, 2024

tustvold Oct 14, 2024

alamb Oct 16, 2024

findepi Oct 13, 2024

alamb Oct 16, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

alamb Oct 16, 2024

etseidl left a comment

Dandandan Oct 14, 2024

tustvold Oct 16, 2024

Dandandan Oct 16, 2024

alamb Oct 16, 2024

tustvold Oct 16, 2024 •

edited

Loading

alamb Oct 16, 2024

tustvold Oct 16, 2024

alamb Oct 16, 2024

alamb commented Oct 16, 2024

alamb commented Oct 16, 2024

alamb commented Oct 17, 2024

alamb commented Oct 17, 2024

	2. Not too much inlining , as the vectorizer gives up if the code is too complex
	2. Not too much inlining necessary, as the vectorizer gives up if the code is too complex

Minor: Document SIMD rationale and tips #6554

Minor: Document SIMD rationale and tips #6554

Conversation

alamb commented Oct 13, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

tustvold Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Oct 16, 2024

alamb commented Oct 16, 2024

alamb commented Oct 17, 2024

alamb commented Oct 17, 2024

alamb commented Oct 13, 2024 •

edited

Loading

tustvold Oct 13, 2024 •

edited

Loading

tustvold Oct 16, 2024 •

edited

Loading