Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Document SIMD rationale and tips #6554

Merged
merged 9 commits into from
Oct 17, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions arrow/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code comments. For example:
// This is not sound because .... see https://issues.apache.org/jira/browse/ARROW-nnnnn
```

### Usage if SIMD / Auto vectorization

This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
alamb marked this conversation as resolved.
Show resolved Hide resolved
instead relies on LLVM's auto-vectorization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... on the compiler's ..." ?

(in fact, vectorization could be applied on Rust MIR level, before LLVM?)

Copy link
Contributor

@tustvold tustvold Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill confess it is a while since i dug into rustc, but I would have thought MIR to be to high level to effectively perform auto-vectorisation which is extremely ISA specific, the best it could do would be to use LLVMs vector types, but general heiristics for doing this would be hard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the docs to say "the Rust compilers auto-vectorization" as I think that is the high level description of what is going on

In this context, I think the use of llvm is an "implementation detail" (albliet an important one) about how that auto-vectorization is accomplished.


SIMD intrinsics are difficult to maintain and can be difficult to reason about.
The auto-vectorizer in LLVM is quite good and often produces better code than
hand-written manual uses of SIMD. In fact, this crate used to to have a fair
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stuterred "to"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

amount of manual SIMD, and over time we've removed it as the auto-vectorized
code was faster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was -> turned out ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased the sentence to hopefully be clearer now

"In fact, this crate used to contain several manual SIMD implementations, which were removed after discovering the auto-vectorized code was faster."


[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html

LLVM is relatively good at vectorizing vertical operations provided:

1. No conditionals within the loop body
2. Not too much inlining , as the vectorizer gives up if the code is too complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace before ,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Not too much inlining , as the vectorizer gives up if the code is too complex
2. Not too much inlining necessary, as the vectorizer gives up if the code is too complex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the meaning, which is that over zealous use of inline can break the vectorizer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, the phrasing was not clear to me. Maybe use "inlining hints" then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to be "Not too much #[inline]"

Copy link
Contributor

@tustvold tustvold Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also changes the meaning, as we have to use #[inline(never)] in various places to actively stop the compiler from inlining things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

How about "not too much inlining (judicious use of #[inline] and #[inline(never)] as the vectorizer gives up if the code is too complex)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move the bracket to "not too much inlining (judicious use of #[inline] and #[inline(never)]) as the vectorizer gives up if the code is too complex" but sounds good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in
b32679a

3. No bitwise horizontal reductions or masking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "bitwise horizontal reductions" an obvious term?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a class of SIMD operations, I think if people don't know to what this refers, they probably aren't the audience for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tustvold , i see your point.
OTOH, SIMD is widely known term and people may come to read this doc out of sheer interest how we think about simdizing the code. The term stands out from the rest of the text as less understood and https://www.google.com/search?q=bitwise+horizontal+reductions doesn't bring an obvious definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could link to https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html

TIL: That is a nice description

I reworded this item to

  1. No [horizontal reductions] or data dependencies

4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer passive voice. "SIMD instructions are enabled in the target ISA"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "Suitable SIMD instructions available in the target ISA (e.g. target-cpu RUSTFLAGS flag)"


The last point is especially important as the default `target-cpu` doesn't
support many SIMD instructions. See the Performance Tips section at the
end of <https://crates.io/crates/arrow>

To ensure your code is fully vectorized, we recommend getting familiar with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your code -> the code

alamb marked this conversation as resolved.
Show resolved Hide resolved
tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again being sure to set RUSTFLAGS

requires to set RUSTFLAGS properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

only once you've exhausted that avenue think of reaching for manual SIMD.
Generally the hard part is getting the algorithm structured in such a way that
it can be vectorized, regardless of what goes and generates those instructions.
alamb marked this conversation as resolved.
Show resolved Hide resolved

# Releases and publishing to crates.io

Please see the [release](../dev/release/README.md) for details on how to create arrow releases
Loading