Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to use float reserved register for TYP_SIMD12 #108629

Merged
merged 5 commits into from
Oct 10, 2024

Conversation

kunalspathak
Copy link
Member

For storing SIMD12, we sometimes use int register and other times float register. Because of this, the reserved register pool might have more than one register leading to assert. The fix is to make the reserved register requirement consistent for arm64, loongarch and risc.

Fixes: #108609

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 7, 2024
@kunalspathak kunalspathak changed the title Arm64: Ensure to use reserved register of correct register type Ensure to use float reserved register for TYP_SIMD12 Oct 7, 2024
@kunalspathak
Copy link
Member Author

@shushanhf @dotnet/samsung @dotnet/jit-contrib

Comment on lines 815 to 816
// To assemble the vector properly we would need an additional float register
buildInternalFloatRegisterDefForNode(indirTree);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISC-V doesn't define FEATURE_SIMD yet but I think it can stay.

@risc-vv
Copy link

risc-vv commented Oct 8, 2024

147f189 is being scheduled for building and testing

GIT: 147f189f58bc72581420232a98aebe0b2e218aba
REPO: dotnet/runtime
BRANCH: main

Release-build FAILED

buildinfo.json

Cloning into '/go-agent/pipelines/Release-build/runtime'...
Updating files:  14% (8730/59374)
Updating files:  15% (8907/59374)
Updating files:  16% (9500/59374)
Updating files:  17% (10094/59374)
Updating files:  18% (10688/59374)
Updating files:  19% (11282/59374)
Updating files:  20% (11875/59374)
Updating files:  21% (12469/59374)
Updating files:  22% (13063/59374)
Updating files:  23% (13657/59374)
Updating files:  24% (14250/59374)
Updating files:  25% (14844/59374)
Updating files:  26% (15438/59374)
Updating files:  27% (16031/59374)
Updating files:  28% (16625/59374)
Updating files:  29% (17219/59374)
Updating files:  30% (17813/59374)
Updating files:  31% (18406/59374)
Updating files:  32% (19000/59374)
Updating files:  33% (19594/59374)
Updating files:  33% (19610/59374)
Updating files:  34% (20188/59374)
Updating files:  35% (20781/59374)
Updating files:  36% (21375/59374)
Updating files:  37% (21969/59374)
Updating files:  38% (22563/59374)
Updating files:  39% (23156/59374)
Updating files:  40% (23750/59374)
Updating files:  41% (24344/59374)
Updating files:  42% (24938/59374)
Updating files:  43% (25531/59374)
Updating files:  44% (26125/59374)
Updating files:  45% (26719/59374)
Updating files:  46% (27313/59374)
Updating files:  47% (27906/59374)
Updating files:  48% (28500/59374)
Updating files:  49% (29094/59374)
Updating files:  50% (29687/59374)
Updating files:  51% (30281/59374)
Updating files:  51% (30805/59374)
Updating files:  52% (30875/59374)
Updating files:  53% (31469/59374)
Updating files:  54% (32062/59374)
Updating files:  55% (32656/59374)
Updating files:  56% (33250/59374)
Updating files:  57% (33844/59374)
Updating files:  58% (34437/59374)
Updating files:  59% (35031/59374)
Updating files:  60% (35625/59374)
Updating files:  61% (36219/59374)
Updating files:  62% (36812/59374)
Updating files:  63% (37406/59374)
Updating files:  64% (38000/59374)
Updating files:  65% (38594/59374)
Updating files:  66% (39187/59374)
Updating files:  67% (39781/59374)
Updating files:  68% (40375/59374)
Updating files:  69% (40969/59374)
Updating files:  70% (41562/59374)
Updating files:  71% (42156/59374)
Updating files:  72% (42750/59374)
Updating files:  73% (43344/59374)
Updating files:  74% (43937/59374)
Updating files:  75% (44531/59374)
Updating files:  76% (45125/59374)
Updating files:  77% (45718/59374)
Updating files:  77% (45746/59374)
Updating files:  78% (46312/59374)
Updating files:  79% (46906/59374)
Updating files:  80% (47500/59374)
Updating files:  81% (48093/59374)
Updating files:  82% (48687/59374)
Updating files:  83% (49281/59374)
Updating files:  84% (49875/59374)
Updating files:  85% (50468/59374)
Updating files:  86% (51062/59374)
Updating files:  87% (51656/59374)
Updating files:  88% (52250/59374)
Updating files:  89% (52843/59374)
Updating files:  90% (53437/59374)
Updating files:  91% (54031/59374)
Updating files:  91% (54496/59374)
Updating files:  92% (54625/59374)
Updating files:  93% (55218/59374)
Updating files:  94% (55812/59374)
Updating files:  95% (56406/59374)
Updating files:  96% (57000/59374)
Updating files:  97% (57593/59374)
Updating files:  98% (58187/59374)
Updating files:  99% (58781/59374)
Updating files: 100% (59374/59374)
Updating files: 100% (59374/59374), done.

@risc-vv
Copy link

risc-vv commented Oct 8, 2024

RISC-V Release-CLR-VF2: 9435 / 9435 (100.00%)
=======================
      passed: 9435
      failed: 0
     skipped: 107
      killed: 0
------------------------
  TOTAL libs: 9542
 TOTAL tests: 9542
   REAL time: 3h 4min 37s 394ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

RISC-V Release-FX-VF2: 567663 / 604018 (93.98%)
=======================
      passed: 567663
      failed: 13
     skipped: 1465
      killed: 36342
------------------------
  TOTAL libs: 257
 TOTAL tests: 605483
   REAL time: 2h 27min 5s 314ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 147f189f58bc72581420232a98aebe0b2e218aba
CI: dc902f49321bed295468a88446a438c5621f465b
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

# CORE_LIBS_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s libs /p:EnableSourceLink=false
# CORE_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s clr+libs+host /p:EnableSourceLink=false

# TESTCLR_BUILD_CMD
runtime/src/tests/build.sh -riscv64 -cross -Release -priority1 -p:UseLocalAppHostPack=true
# TESTCLR_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --core_root ./coreclr.Release/Tests/Core_Root --testhost ./testhost.Release --atest ./coreclr.Release --test ./ --log_dir ./logs  --timeout 2700 --log_level DEBUG --xunit xunit.Release
# TESTCLR_RUN
/var/lib/go-agent/pipelines/Release-CLR-VF2/logs/run_tests.log
cp -R /var/lib/go-agent/pipelines/Release-CLR-VF2/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && __TestDotNetCmd=/var/lib/go-agent/pipelines/Release-CLR-VF2/testhost.Release/dotnet CORE_ROOT=/var/lib/go-agent/pipelines/Release-CLR-VF2/coreclr.Release/Tests/Core_Root  /usr/bin/time -f "exec_time: %e" ./_TEST_BINARY_

# TESTFX_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -rc Release -hc Release -lc Release -s libs.tests --testscope innerloop /p:EnableSourceLink=false /p:UseLocalAppHostPack=true
# TESTFX_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --corefx --testhost ./testhost.Release --atest ./corefx.Release --log_dir ./logs  --timeout 6000 --memlimit 4096 --jobs 8 --log_level DEBUG --xunit xunit.Release
# TESTFX_RUN
/var/lib/go-agent/pipelines/Release-FX-VF2/logs/run_tests.log
cp -R /var/lib/go-agent/pipelines/Release-FX-VF2/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && __TestDotNetCmd=/var/lib/go-agent/pipelines/Release-FX-VF2/testhost.Release/dotnet  /usr/bin/time -f "exec_time: %e" /var/lib/go-agent/pipelines/Release-FX-VF2/testhost.Release/dotnet exec   xunit.console.dll _TEST_BINARY_ -nologo -nocolor -notrait category=failing

# TEST_ENV
DOTNET_JitStress=;DOTNET_JitStressRegs=;DOTNET_GCStress=;DOTNET_JITMinOpts=;DOTNET_TailcallStress=;DOTNET_TieredCompilation=
RISC-V Release-CLR-VF2: 9436 / 9436 (100.00%)
=======================
      passed: 9436
      failed: 0
     skipped: 107
      killed: 0
------------------------
  TOTAL libs: 9543
 TOTAL tests: 9543
   REAL time: 3h 2min 17s 295ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

RISC-V Release-FX-VF2: 567663 / 604018 (93.98%)
=======================
      passed: 567663
      failed: 13
     skipped: 1465
      killed: 36342
------------------------
  TOTAL libs: 257
 TOTAL tests: 605483
   REAL time: 2h 27min 5s 314ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 147f189f58bc72581420232a98aebe0b2e218aba
CI: dc902f49321bed295468a88446a438c5621f465b
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

# CORE_LIBS_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s libs /p:EnableSourceLink=false
# CORE_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s clr+libs+host /p:EnableSourceLink=false

# TESTCLR_BUILD_CMD
runtime/src/tests/build.sh -riscv64 -cross -Release -priority1 -p:UseLocalAppHostPack=true
# TESTCLR_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --core_root ./coreclr.Release/Tests/Core_Root --testhost ./testhost.Release --atest ./coreclr.Release --test ./ --log_dir ./logs  --timeout 2700 --log_level DEBUG --xunit xunit.Release
# TESTCLR_RUN
/var/lib/go-agent/pipelines/Release-CLR-VF2/logs/run_tests.log
cp -R /var/lib/go-agent/pipelines/Release-CLR-VF2/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && __TestDotNetCmd=/var/lib/go-agent/pipelines/Release-CLR-VF2/testhost.Release/dotnet CORE_ROOT=/var/lib/go-agent/pipelines/Release-CLR-VF2/coreclr.Release/Tests/Core_Root  /usr/bin/time -f "exec_time: %e" ./_TEST_BINARY_

# TESTFX_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -rc Release -hc Release -lc Release -s libs.tests --testscope innerloop /p:EnableSourceLink=false /p:UseLocalAppHostPack=true
# TESTFX_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --corefx --testhost ./testhost.Release --atest ./corefx.Release --log_dir ./logs  --timeout 6000 --memlimit 4096 --jobs 8 --log_level DEBUG --xunit xunit.Release
# TESTFX_RUN
/var/lib/go-agent/pipelines/Release-FX-VF2/logs/run_tests.log
cp -R /var/lib/go-agent/pipelines/Release-FX-VF2/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && __TestDotNetCmd=/var/lib/go-agent/pipelines/Release-FX-VF2/testhost.Release/dotnet  /usr/bin/time -f "exec_time: %e" /var/lib/go-agent/pipelines/Release-FX-VF2/testhost.Release/dotnet exec   xunit.console.dll _TEST_BINARY_ -nologo -nocolor -notrait category=failing

# TEST_ENV
DOTNET_JitStress=;DOTNET_JitStressRegs=;DOTNET_GCStress=;DOTNET_JITMinOpts=;DOTNET_TailcallStress=;DOTNET_TieredCompilation=
RISC-V Release-CLR-QEMU: 9436 / 9436 (100.00%)
=======================
      passed: 9436
      failed: 0
     skipped: 107
      killed: 0
------------------------
  TOTAL libs: 9543
 TOTAL tests: 9543
   REAL time: 58min 16s 339ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

RISC-V Release-FX-QEMU: 570453 / 608705 (93.72%)
=======================
      passed: 570453
      failed: 225
     skipped: 1573
      killed: 38027
------------------------
  TOTAL libs: 257
 TOTAL tests: 610278
   REAL time: 2h 22min 22s 350ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 147f189f58bc72581420232a98aebe0b2e218aba
CI: dc902f49321bed295468a88446a438c5621f465b
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

# CORE_LIBS_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s libs /p:EnableSourceLink=false
# CORE_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s clr+libs+host /p:EnableSourceLink=false

# TESTCLR_BUILD_CMD
runtime/src/tests/build.sh -riscv64 -cross -Release -priority1 -p:UseLocalAppHostPack=true
# TESTCLR_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --core_root ./coreclr.Release/Tests/Core_Root --testhost ./testhost.Release --atest ./coreclr.Release --test ./ --log_dir ./logs  --timeout 2700 --log_level DEBUG --xunit xunit.Release
# TESTCLR_RUN
/godata/pipelines/Release-CLR-QEMU/logs/run_tests.log
cp -R /godata/pipelines/Release-CLR-QEMU/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && ROOTFS_DIR=/crossrootfs/riscv64 QEMU_LD_PREFIX=/crossrootfs/riscv64 __TestDotNetCmd=/godata/pipelines/Release-CLR-QEMU/testhost.Release/dotnet CORE_ROOT=/godata/pipelines/Release-CLR-QEMU/coreclr.Release/Tests/Core_Root  /usr/bin/time -f "exec_time: %e" ./_TEST_BINARY_

# TESTFX_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -rc Release -hc Release -lc Release -s libs.tests --testscope innerloop /p:EnableSourceLink=false /p:UseLocalAppHostPack=true
# TESTFX_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --corefx --testhost ./testhost.Release --atest ./corefx.Release --log_dir ./logs  --timeout 6000 --memlimit 4096 --jobs 16 --log_level DEBUG --xunit xunit.Release
# TESTFX_RUN
/godata/pipelines/Release-FX-QEMU/logs/run_tests.log
cp -R /godata/pipelines/Release-FX-QEMU/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && ROOTFS_DIR=/crossrootfs/riscv64 QEMU_LD_PREFIX=/crossrootfs/riscv64 __TestDotNetCmd=/godata/pipelines/Release-FX-QEMU/testhost.Release/dotnet  /usr/bin/time -f "exec_time: %e" /godata/pipelines/Release-FX-QEMU/testhost.Release/dotnet exec   xunit.console.dll _TEST_BINARY_ -nologo -nocolor -notrait category=failing

# TEST_ENV
DOTNET_JitStress=;DOTNET_JitStressRegs=;DOTNET_GCStress=;DOTNET_JITMinOpts=;DOTNET_TailcallStress=;DOTNET_TieredCompilation=
RISC-V Release-CLR-QEMU: 9435 / 9435 (100.00%)
=======================
      passed: 9435
      failed: 0
     skipped: 107
      killed: 0
------------------------
  TOTAL libs: 9542
 TOTAL tests: 9542
   REAL time: 58min 15s 267ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

RISC-V Release-FX-QEMU: 570453 / 608705 (93.72%)
=======================
      passed: 570453
      failed: 225
     skipped: 1573
      killed: 38027
------------------------
  TOTAL libs: 257
 TOTAL tests: 610278
   REAL time: 2h 22min 22s 350ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 147f189f58bc72581420232a98aebe0b2e218aba
CI: dc902f49321bed295468a88446a438c5621f465b
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

# CORE_LIBS_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s libs /p:EnableSourceLink=false
# CORE_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s clr+libs+host /p:EnableSourceLink=false

# TESTCLR_BUILD_CMD
runtime/src/tests/build.sh -riscv64 -cross -Release -priority1 -p:UseLocalAppHostPack=true
# TESTCLR_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --core_root ./coreclr.Release/Tests/Core_Root --testhost ./testhost.Release --atest ./coreclr.Release --test ./ --log_dir ./logs  --timeout 2700 --log_level DEBUG --xunit xunit.Release
# TESTCLR_RUN
/godata/pipelines/Release-CLR-QEMU/logs/run_tests.log
cp -R /godata/pipelines/Release-CLR-QEMU/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && ROOTFS_DIR=/crossrootfs/riscv64 QEMU_LD_PREFIX=/crossrootfs/riscv64 __TestDotNetCmd=/godata/pipelines/Release-CLR-QEMU/testhost.Release/dotnet CORE_ROOT=/godata/pipelines/Release-CLR-QEMU/coreclr.Release/Tests/Core_Root  /usr/bin/time -f "exec_time: %e" ./_TEST_BINARY_

# TESTFX_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -rc Release -hc Release -lc Release -s libs.tests --testscope innerloop /p:EnableSourceLink=false /p:UseLocalAppHostPack=true
# TESTFX_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --corefx --testhost ./testhost.Release --atest ./corefx.Release --log_dir ./logs  --timeout 6000 --memlimit 4096 --jobs 16 --log_level DEBUG --xunit xunit.Release
# TESTFX_RUN
/godata/pipelines/Release-FX-QEMU/logs/run_tests.log
cp -R /godata/pipelines/Release-FX-QEMU/xunit.Release "/_PATH_/_WITH_/_TEST_"/ ; cd "/_PATH_/_WITH_/_TEST_" && ROOTFS_DIR=/crossrootfs/riscv64 QEMU_LD_PREFIX=/crossrootfs/riscv64 __TestDotNetCmd=/godata/pipelines/Release-FX-QEMU/testhost.Release/dotnet  /usr/bin/time -f "exec_time: %e" /godata/pipelines/Release-FX-QEMU/testhost.Release/dotnet exec   xunit.console.dll _TEST_BINARY_ -nologo -nocolor -notrait category=failing

# TEST_ENV
DOTNET_JitStress=;DOTNET_JitStressRegs=;DOTNET_GCStress=;DOTNET_JITMinOpts=;DOTNET_TailcallStress=;DOTNET_TieredCompilation=

@sirntar
Copy link
Member

sirntar commented Oct 8, 2024

I've scheduled tests to make sure it doesn't affect rv64, but it is as @tomeksowi said: we don't support SIMD yet. Although there are plans to support it in the following year.

@EgorBo
Copy link
Member

EgorBo commented Oct 8, 2024

I guess failures are related? I am puzzled why it needs a float rather than int temp for Simd12 store

@kunalspathak
Copy link
Member Author

I guess failures are related? I am puzzled why it needs a float rather than int temp for Simd12 store

Yes, it is related. I didn't any spec that says which register to pick, so to make it consistent with x86, i picked floating register and also given the fact that GPRs are more common than floating registers. I will investigate the test failure.

@kunalspathak
Copy link
Member Author

I guess failures are related? I am puzzled why it needs a float rather than int temp for Simd12 store

Yes, it is related. I didn't any spec that says which register to pick, so to make it consistent with x86, i picked floating register and also given the fact that GPRs are more common than floating registers. I will investigate the test failure.

Turns out that we end up generating wrong instruction if we use floating point as scratch register:

image

if (isGeneralRegisterOrZR(reg2))
{
fmt = IF_DV_2C; // Alias for 'ins'
break;
}
else if (isVectorRegister(reg2))
{
fmt = IF_DV_2E; // Alias for 'dup'
break;

I converted all the places to instead create int internal register. To fix the original issue, since we can have multiple internal registers specially for TYP_SIMD12 (one for PutArgStk and other for storeloc, for example) and we might assign different registers to them, the fix was to Extract instead of using GetSingle().

@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member Author

@EgorBo @dotnet/jit-contrib PTAL

@@ -17424,7 +17424,7 @@ void emitter::emitStoreSimd12ToLclOffset(unsigned varNum, unsigned offset, regNu
else
{
// Extract upper 4-bytes from data
regNumber tmpReg = codeGen->internalRegisters.GetSingle(tmpRegProvider);
regNumber tmpReg = codeGen->internalRegisters.Extract(tmpRegProvider);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, this function checks whether we don't have any spare temps at all - it does the store using two shuffles without temps, otherwise, do a simpler routine with temp, but looks like in some cases we over-allocate temps for it? Given that diffs are fine I guess it's ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, we may have more than 1 temps and Extract make sure to get the first available and remove it from the pool.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stores for SIMD12 are a popular trouble (we had like 3 or 4 bugs with them already) 😢

@kunalspathak kunalspathak merged commit cb256e2 into dotnet:main Oct 10, 2024
115 of 117 checks passed
@kunalspathak kunalspathak deleted the onlyonebit branch October 10, 2024 22:49
@github-actions github-actions bot locked and limited conversation to collaborators Nov 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

osx/arm64: Assertion failed 'genExactlyOneBit(availableSet)' during 'Generate code'
5 participants