Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics #5020

natevm · 2024-09-06T00:38:40Z

For context, see the related issue here

In short, bitfield insertion and extraction are useful tools for reinterpreting structures where types have differing sizes, as they allow for simpler and more hardware-efficient instructions for byte manipulation.

Before this change, bitfieldExtract and bitfieldInsert were only supported outside of the compiler, when using -allow-glsl.

Here, I'm introducing KIROp_BitfieldExtract and KIOROp_BitfieldInsert instructions, which by default generate the original fallback bitfield manipulation logic but in CLikeSourceEmitter.

Then, when emitting SPIR-V or GLSL, these compilation paths override this default logic to instead tap into more specialized intrinsics.

…internal compiler use

csyonghe

This needs to come with a test.

source/slang/core.meta.slang

source/slang/slang-emit-c-like.cpp

natevm · 2024-09-06T07:32:16Z

This needs to come with a test.

Agreed. Looks like the current test suite is passing which is good. I’ll see if I can write something up which tests on a couple different target backends.

…n case

natevm · 2024-09-08T23:30:05Z

@csyonghe What's the correct approach to constructing vectors for the CUDA backend?

bitfieldExtract is a component-wise intrinsic, so I need to support a function like this as a right hand side which can be assigned to the left:

uint4 bfe(uint4 val, int off, int bits) {
    return ((val >> off) & ((1u << bits) - 1));
}

However, every time I try to construct an int4 from off, I get errors similar to the following:

nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "unsigned int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"
nvrtc 11.2: (25): error : no suitable constructor exists to convert from "int" to "int4"

I've tried constructing an int4 as int4(off, off, off, off) but this does not work either.

I could try emitting a make_int4 as a special case for CUDA, but this doesn't generalize easily to uint8_t, uint16_t, or uint64_t. (I suppose I could make a lookup table… but that seems a bit heavy handed…)

natevm · 2024-09-09T00:40:16Z

For the time being, looks like a special case lookup table works for CUDA.

…tion

natevm · 2024-09-09T03:09:21Z

On adding these tests, I noticed that the current logic in glsl.meta.slang for bitfield extraction in the signed integer case was incorrect. For signed integers, the extracted bits are expected to be sign-extended, but the fallback for this wasn’t doing sign extension.

The current logic now correctly accounts for this, and I have some explicit tests to back that up now too.

csyonghe · 2024-09-09T17:10:26Z

tests/language-feature/bitfield/bitfield-extract-i32.slang

@@ -0,0 +1,56 @@
+//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-cpu -output-using-type


We need to cover all the other targets, with this line duplicated where -cpu is replaced with -vk, -d3d, -metal and -cuda.

I'll add these in and see if the existing tests pass. We might also want to consider adding overrides for metal and cuda if these languages have direct bitfield extraction intrinsics.

For CUDA, at least for scalars, we could be using the following PTX:

unsigned int bitfieldExtract(unsigned int val, int pos, int len) { unsigned int r; asm("bfe.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(val), "r"(pos), "r"(len)); return r; } unsigned int bitfieldInsert(unsigned int src, unsigned int dst, int pos, int len) { unsigned int r; asm("bfi.b32 %0, %1, %2, %3, %4;" : "=r"(r) : "r"(src), "r"(dst), "r"(pos), "r"(len)); return r; }

We'd need to add a .s32 for the sign extension case, but that's not bad. I can look into a solution for Metal as well, but hopefully in the meantime the c-like fallback will work there too.

natevm · 2024-09-09T20:35:53Z

Looks like there are some newly failing tests on my end after adding these flags for the other targets. I'll see if I can figure out what's going on.

For i32 extract

using -cpu seems to be throwing error : invalid operands to binary expression ('Vector<int32_t, 4>' (aka 'Vector<int, 4>') and 'int32_t' (aka 'int'))
-- Fixed, was missing a vector type for cpu targets
all other targets pass

For i64 extract

Using -cpu again throws similar invalid operand error.
-- Fixed, was missing a vector type for cpu targets
Using -vk throws unexpected type given to bitfieldExtract in SPIR-V emit. I might have forgotten to ensure 64-bit values work in the SPIR-V specific logic...

The same failure patterns occur for i32/64 insertion. I'm not sure yet about for metal. Hopefully the CI will give us some answers there.

… CPU backend

natevm · 2024-09-10T19:31:32Z

I'm a bit confused with the Vulkan spec... I think I might have found a typo?...

According to the spec:

The Base operand of any OpBitCount, OpBitReverse, OpBitFieldInsert, OpBitFieldSExtract, or OpBitFieldUExtract instruction must be a 32-bit integer scalar or a vector of 32-bit integers.

This appears even if I require SpvCapabilityInt64. At least with NVIDIA, I'm finding that the tests here reveal that 64-bit integers are supported, and do give correct results. But evidently there is ambiguity on what exactly the shaderInt64 capability means.

natevm · 2024-09-10T21:09:02Z

It appears the two outstanding issues are:

When SLANG_RUN_SPIRV_VALIDATION=1, SPIR-V validation does not correctly consider how SpvCapabilityInt64 extends the capabilities of OpBitfieldSExtract/OpBitfieldUExtract/OpBitfieldInsert intrinsics. This is causing the one windows CI build to fail. Setting SLANG_RUN_SPIRV_VALIDATION=0 and ignoring the validation errors, I get correct results.

This appears to be a regression caused by this PR from Samsung: KhronosGroup/SPIRV-Tools#4758. Apparently their hardware does not support 64-bit values for certain intrinsics.

For Metal, it appears that when targetting release mode, Slang appears to be translating my vec<int64_t, 4> Metal type into int64_t4, which is not an existing type in the Metal shading language.

So for 1, unfortunately because of some restrictions with Samsung's HW, there is no way to really tap into the fast path HW on NVIDIA architectures for non-32-bit types with bitfield extraction / insertion. So I'll have to think about some workaround for non-32-bit types...

For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.

@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?

csyonghe · 2024-09-11T21:18:28Z

You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.

csyonghe · 2024-09-11T21:18:55Z

Only release build runs the full suite of tests, debug build doesn't run most tests.

natevm · 2024-09-12T02:53:05Z

You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations.

I did try to use this flag like so:
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-vk -skip-spirv-validation

But this throws an error that -skip-spirv-validation is not a valid command line argument. Perhaps it's because I'm using this TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK) pattern?

The other test using this flag follows a different pattern:
//TEST:SIMPLE(filecheck=SPIRV):-target spirv -entry computeMain -stage compute -emit-spirv-directly -skip-spirv-validation

csyonghe · 2024-09-12T02:55:52Z

For compute tests, you need -slang -skip-spirv-validation.

… targets

…g into natevm-bitfield-instructions

natevm · 2024-09-13T20:22:28Z

For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell.

@csyonghe For 2, do you know what's causing the release build on MacOS to fail with vec4(int64_t, N), but the debug build on MacOS to pass?

@csyonghe looks like this is the last major issue at the moment. I don't have a Mac, so I can't reproduce this locally to debug it. Looks like 64-bit integer vectors broken in general in the Metal backend. Perhaps we should file an issue to resolve that.

csyonghe · 2024-09-13T20:28:15Z

You should be able to fix this by changing the way we emit vector types when the element type is 64bit int in slang-emit-metal.cpp, look for kIROp_VectorType in that file to find the place to emit the type.

You can test this locally on a windows machine by adding a
//TEST:SIMPLE:-target metal
On the offending test file to trigger a compile only test with the source, that will run the metal compiler on windows to give you any metal compile errors without actually running the shader.

…g into natevm-bitfield-instructions

natevm · 2024-09-15T21:19:02Z

@csyonghe looks like all relevant tests are passing.

There’s one windows CI machine which is reporting that bash is missing, but I see the same error on ToT, so I’m assuming that’s okay to see.

natevm

Quick pass over braces

source/slang/slang-emit-c-like.cpp

source/slang/slang-emit-glsl.cpp

source/slang/slang-emit-spirv.cpp

csyonghe · 2024-09-16T18:03:38Z

source/slang/slang-emit-c-like.cpp

+            m_writer->emitRawText(std::to_string(N).c_str());
+        }
+        // Special handling required for Metal target
+        else if (isMetalTarget(getTargetReq()))


We shouldn't need this, and instead we should make sure the metal backend can emit all vector types correctly.

csyonghe · 2024-09-16T18:06:50Z

source/slang/slang-emit-c-like.cpp

+    return true;
+}
+
+void CLikeSourceEmitter::emitVecNOrScalar(IRVectorType* vectorType, std::function<void()> emitComponentLogic)


naming suggestion: maybeEmitMakeVectorFromScalar. Instead of doing call-back style with std::function (which is a virtual function call), just use bool maybeEmitMakeVectorFromScalar(IRType* type); and void maybeCloseMakeVectorFromScalar(bool v).

csyonghe · 2024-09-16T18:09:30Z

source/slang/slang-emit-metal.cpp

@@ -538,7 +538,8 @@ bool MetalSourceEmitter::tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inO

 void MetalSourceEmitter::emitVectorTypeNameImpl(IRType* elementType, IRIntegerValue elementCount)
 {
-    emitSimpleTypeImpl(elementType);
+    // NM: Passing count here, as Metal 64-bit vector type names do not match their scalar equivalents.
+    emitSimpleTypeKnowingCount(elementType, elementCount);


I think you can special case the 64bit types in the switch statement below, and emit vec<T, N> instead of TN for those types. We shouldn't need the emitSimpleTypeKnowingCount() change.

natevm · 2024-09-28T23:58:42Z

Apologies on the delay here. Been a bit tied up with a move between states. I should have a bit more time to work on this in the upcoming week.

natevm added 3 commits September 5, 2024 18:22

promoting bitfield extraction and insertion to become intrinsics for …

7d44ee7

…internal compiler use

removing duplicate intrinsics from glsl.meta.slang

554b4f0

reverting some unwanted changes

7eccd6c

csyonghe added the pr: non-breaking PRs without breaking changes label Sep 6, 2024

csyonghe reviewed Sep 6, 2024

View reviewed changes

source/slang/core.meta.slang Outdated Show resolved Hide resolved

source/slang/slang-emit-c-like.cpp Outdated Show resolved Hide resolved

natevm added 2 commits September 6, 2024 13:54

small formatting fixes

1d6e26c

adding test for bitfield extract, currently failing the sign extensio…

783328f

…n case

adding some tests for 32 bit and 64 bit bitfield insertion and extrac…

00804a1

…tion

csyonghe reviewed Sep 9, 2024

View reviewed changes

adding more targets to test.

648107f

natevm added 2 commits September 10, 2024 11:48

fixing small regression when emitting component-wise bitfield ops for…

a3529ca

… CPU backend

updating spirv emit to account for non-32-bit ints

d04bb68

some refactoring for metal

4cd73c9

Merge branch 'master' into natevm-bitfield-instructions

e189fd5

natevm added 2 commits September 13, 2024 10:53

making sign extension more robust to order ops which can vary between…

164a01b

… targets

Merge branch 'natevm-bitfield-instructions' of github.com:natevm/slan…

fee1e81

…g into natevm-bitfield-instructions

natevm mentioned this pull request Sep 13, 2024

uint64_t4 missing from Metal backend #5062

Closed

natevm added 7 commits September 15, 2024 11:26

updating slang emit metal to try to correct emitting 64-bit vector types

fe183e2

fixing missed switch cases

8d2a47e

refactoring tests

7eb6b20

adding 16-bit test for bitfield insert

8561801

Merge branch 'master' into natevm-bitfield-instructions

9534072

fixing sign in insert test

fc0f6c8

Merge branch 'natevm-bitfield-instructions' of github.com:natevm/slan…

79d754a

…g into natevm-bitfield-instructions

natevm commented Sep 15, 2024

View reviewed changes

Apply suggestions from code review

b5e3ca5

csyonghe reviewed Sep 16, 2024

View reviewed changes

csyonghe mentioned this pull request Nov 4, 2024

Bug on supporting bit_cast<uint64_t>(double) #5470

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics #5020

Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics #5020

natevm commented Sep 6, 2024

csyonghe left a comment •

edited

Loading

natevm commented Sep 6, 2024

natevm commented Sep 8, 2024 •

edited

Loading

natevm commented Sep 9, 2024

natevm commented Sep 9, 2024 •

edited

Loading

csyonghe Sep 9, 2024

natevm Sep 9, 2024

natevm commented Sep 9, 2024 •

edited

Loading

natevm commented Sep 10, 2024 •

edited

Loading

natevm commented Sep 10, 2024 •

edited

Loading

csyonghe commented Sep 11, 2024

csyonghe commented Sep 11, 2024

natevm commented Sep 12, 2024

csyonghe commented Sep 12, 2024

natevm commented Sep 13, 2024 •

edited

Loading

csyonghe commented Sep 13, 2024

natevm commented Sep 15, 2024

natevm left a comment

csyonghe Sep 16, 2024

csyonghe Sep 16, 2024

csyonghe Sep 16, 2024

natevm commented Sep 28, 2024

		@@ -0,0 +1,56 @@
		//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-cpu -output-using-type

Promote bitfieldExtract and bitfieldInsert to become Slang intrinsics #5020

Are you sure you want to change the base?

Promote bitfieldExtract and bitfieldInsert to become Slang intrinsics #5020

Conversation

natevm commented Sep 6, 2024

csyonghe left a comment • edited Loading

Choose a reason for hiding this comment

natevm commented Sep 6, 2024

natevm commented Sep 8, 2024 • edited Loading

natevm commented Sep 9, 2024

natevm commented Sep 9, 2024 • edited Loading

csyonghe Sep 9, 2024

Choose a reason for hiding this comment

natevm Sep 9, 2024

Choose a reason for hiding this comment

natevm commented Sep 9, 2024 • edited Loading

natevm commented Sep 10, 2024 • edited Loading

natevm commented Sep 10, 2024 • edited Loading

csyonghe commented Sep 11, 2024

csyonghe commented Sep 11, 2024

natevm commented Sep 12, 2024

csyonghe commented Sep 12, 2024

natevm commented Sep 13, 2024 • edited Loading

csyonghe commented Sep 13, 2024

natevm commented Sep 15, 2024

natevm left a comment

Choose a reason for hiding this comment

csyonghe Sep 16, 2024

Choose a reason for hiding this comment

csyonghe Sep 16, 2024

Choose a reason for hiding this comment

csyonghe Sep 16, 2024

Choose a reason for hiding this comment

natevm commented Sep 28, 2024

Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics #5020

Promote `bitfieldExtract` and `bitfieldInsert` to become Slang intrinsics #5020

csyonghe left a comment •

edited

Loading

natevm commented Sep 8, 2024 •

edited

Loading

natevm commented Sep 9, 2024 •

edited

Loading

natevm commented Sep 9, 2024 •

edited

Loading

natevm commented Sep 10, 2024 •

edited

Loading

natevm commented Sep 10, 2024 •

edited

Loading

natevm commented Sep 13, 2024 •

edited

Loading