Initial support for RISCV vector extension #1716

ita-sc · 2023-12-28T15:59:54Z

Hello there! I've added initial support for the RISC-V vector extension in the EVE library. All unit tests have successfully passed with this patch. To verify, please follow these steps:

Obtain the riscv-gcc-13 toolchain (for sysroot). Set the path to it as the environment variable RISCV_GCC.
Use clang with the patch currently under review ([https://github.com/llvm/llvm-project/pull/76510]). To build it, navigate to the llvm directory:

 mkdir build
 cd build
 cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang" -DCMAKE_BUILD_TYPE=Release ../llvm
 cmake --build . -t clang

Add it to your PATH.
After completing the above steps, you should be able to run unit tests by specifying cmake/toolchain/clang.rvv128.cmake as the toolchain file.
Your contributions are highly appreciated! If you encounter any issues or have questions, feel free to reach out. Thanks for your valuable work!

jfalcou · 2023-12-28T16:05:11Z

Thanks a lot for this huge contribution. I will add the RiscV compiler to our CI image so we can have the tests running. We are gonna review this soonish.

include/eve/arch/riscv/tags.hpp

jfalcou · 2023-12-28T20:48:53Z

include/eve/detail/spy.hpp

@@ -808,6 +814,11 @@ namespace avx512
 #   endif
 # endif
 #endif
+#if !defined(SPY_SIMD_DETECTED) && defined(__riscv) && defined(__riscv_vector)
+#  define SPY_SIMD_DETECTED ::spy::detail::simd_version::rvv_
+#  define SPY_SIMD_IS_RISCV_FLEXIBLE_SVE


should it be just SPY_SIMD_IS_RISCV_FLEXIBLE ?
Also, care to make an issue/PR over at www.github.com/jfalcou/spy ?

DenisYaroshevskiy

Hii.

Thank you for putting in effort!

96 files will take some time to review, I personally will do in chunks - some comments - then more comments etc.

DenisYaroshevskiy · 2023-12-28T22:21:20Z

include/eve/arch/riscv/as_register.hpp

+  static constexpr bool is_fp_v     = std::is_floating_point_v<Type>;
+  static constexpr bool is_signed_v = std::is_signed_v<Type>;
+
+#  ifdef EVE_RISCV_REG_CHOOSE


heavy macro usage seems unjustified on a surface.

Can you maybe explain what you are trying to do and then we figure out how?

I am trying to create a group of functions to make the best decision for vector type depending on requested cardinal/type and sew(single element width). For example, let's suppose vlen == 128:

you need to work with 16 int8 elements - you can use vector type vint8m1_t

you need to work with 128 uint8 elements - you can use vector type vuint8m8_t

you need to work with 8 float elements - you can use vector type vfloat32m4

DenisYaroshevskiy

I luck understanding of risk-v simd fundamentals.
I see this doc https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc but it's huge and contains a lot of information that I don't immediately need.

I also can't find intrinsics descriptions.

What are you using?

DenisYaroshevskiy · 2023-12-29T10:06:42Z

include/eve/arch/riscv/rvv_common_masks.hpp

+//==================================================================================================
+#pragma once
+
+#include <eve/detail/function/friends.hpp>


That's not an include you should use

Ok, I wanted to use it as I need self_neq, but it seems I do not need this. Removed.

DenisYaroshevskiy · 2023-12-29T10:20:37Z

include/eve/arch/riscv/rvv_common_masks.hpp

+EVE_FORCEINLINE logical<wide<T, N>>
+                rvv_true()
+{
+  static constexpr auto   lmul     = riscv_rvv_dyn_::getLMUL<T>(N::value);


I'm struggling with riscv_rvv_dyn_. I can't find where it;s defined.
Is it that different machines use different number of bits per logical element?

Like on one machine logical is represented by 1 bit, on the other logical - 8 bits?

It is defined in tags and I use it as common place for different support function, like calculation of LMUL( vector register group count - the number of vector registers, that participate in this operation), as well as standard EVE functionality (e.g. expected_cardinal).

different machines use different number of bits per logical element

Renamed. There I calculate ratio=SEW/LMUl.

DenisYaroshevskiy · 2023-12-29T10:26:02Z

include/eve/arch/riscv/rvv_common_masks.hpp

+{
+  static constexpr auto   lmul     = riscv_rvv_dyn_::getLMUL<T>(N::value);
+  static constexpr size_t size     = sizeof(T) * 8;
+  static constexpr size_t bit_size = lmul > 0 ? size / lmul : size * (-lmul);


this seems to be copypasted a lot. Should we just have logical<wide<T, N>>::platform_bit_size or smth?

Also - really - this is changing depending on machine???

Should we just have logical<wide<T, N>>::platform_bit_size or smth?

This is not platform_bit_size of logical, it is the number that should be put to form right type. This number is equal to SEW/LMUL. Renamed it to ratio.

Also - really - this is changing depending on machine?

Well, it depends on VLEN of your machine.

For example, you have VLEN==128. To operate with 8 int32 with one instruction, you will need to set LMUL( the number of consecutive vector registers that participate as one operand) to M2 (you need 2 registers). And mask for this you will need type vbool16_t(32/2=16).

If you have VLEN==256, you can operate 8 int32 with LMUL == M1, so you need to use vbool32_t(32/1=32).

DenisYaroshevskiy · 2023-12-29T10:27:06Z

include/eve/arch/riscv/tags.hpp

+//================================================================================================
+template<std::size_t Size> struct rvv_abi_
+{
+  static_assert(CHAR_BIT == 8, "[eve riscv] - For riscv we expect CHAR_BIT to be 8");


we everywhere expect that - I would remove this.

DenisYaroshevskiy · 2023-12-29T10:28:35Z

include/eve/arch/riscv/rvv_common_masks.hpp

+  static constexpr auto   lmul     = riscv_rvv_dyn_::getLMUL<T>(N::value);
+  static constexpr size_t size     = sizeof(T) * 8;
+  static constexpr size_t bit_size = lmul > 0 ? size / lmul : size * (-lmul);
+  if constexpr( bit_size == 1 ) return __riscv_vmclr_m_b1(N::value);


my lack of knowledge here shows through heavily - what is the difference between different clears? I'd expect all clears to be the same.

They return different types.

For example, __riscv_vmclr_m_b1 returns vbool1_t, and __riscv_vmclr_m_b8 results in vbool8_t.

DenisYaroshevskiy · 2023-12-29T10:42:03Z

include/eve/arch/riscv/tags.hpp

+    auto type_size = sizeof(Type);
+    if( type_size == 1 ) return 2;
+    if( type_size == 2 ) return 2;
+    if( type_size == 4 ) return 2;


this looks weird. Maybe comment and return 2, if it's ;ess than 4?

DenisYaroshevskiy · 2023-12-29T10:43:19Z

include/eve/arch/riscv/tags.hpp

+{};
+#else
+struct riscv_rvv_dyn_ : rvv_abi_<1>
+{};


this doesn't make sense to me. Is it like a fallback? If we can't run v5, risk-v5 should not be used by eve

The idea is following:

You need to specify while running complier VLEN (bit size of one vector register) by passing -mrvv-vector-bits=size. After this compiler define __riscv_v_fixed_vlen that will be equal the number that you passed to compiler (currently I use 128).

As tags included without information that current platform is RISC-V with vector extension, we should check that __riscv_v_fixed_vlen is defined, and if not - define some empty riscv_rvv_dyn_ just not to break complation.

DenisYaroshevskiy · 2023-12-29T10:44:49Z

include/eve/arch/riscv/tags.hpp

+#endif
+
+//================================================================================================
+// Dispatching tag for ARM SIMD implementation


My bad, fixed to RISC-V

DenisYaroshevskiy · 2023-12-29T10:44:57Z

include/eve/arch/riscv/tags.hpp

+};
+
+//================================================================================================
+// SVE extensions tag objects


DenisYaroshevskiy · 2023-12-29T10:45:46Z

include/eve/arch/riscv/tags.hpp

+// RISCV SVE ABI concept
+//================================================================================================
+template<typename T>
+concept rvv_abi = detail::is_one_of<T>(detail::types<riscv_rvv_dyn_> {});


Well, I wanted to note that we should use this template for any VLEN. I could rename, if you want.

test/unit/module/algo/algorithm/min_element_special_cases.cpp

DenisYaroshevskiy · 2023-12-29T11:13:23Z

test/unit/memory/load/tuple.cpp

                              , &data1[idx1] - 1
-                              , eve::as_aligned(&data2[idx2],typename w8_t::cardinal_type{})


the test was correct. It's testing fixed size wide

The problem here was as wide type, associated with data1, will have 16 elements, so during load it will load more than needed, changed. Current version increases fixed wide size.

I don't understand you. w8_t is 8 elements. This verifies that you can load 8 elements. Not 16.

What fails and how.

test/unit/api/regular/wide.cpp

DenisYaroshevskiy · 2023-12-29T16:28:04Z

I think I need to understand more about risc v simd/vector code. What docs do you use? I only see a very large GitHub repo, that has a lot of information and no intrinsics.

Can you share smth?

ita-sc · 2023-12-29T16:59:01Z

I think I need to understand more about risc v simd/vector code. What docs do you use? I only see a very large GitHub repo, that has a lot of information and no intrinsics.

Can you share smth?

Yes, there is a separate repository: https://github.com/riscv-non-isa/rvv-intrinsic-doc

There you need release v1.0-rc0

include/eve/detail/function/simd/riscv/compounds.hpp

DenisYaroshevskiy · 2023-12-30T10:00:25Z

include/eve/detail/function/simd/riscv/load.hpp

+EVE_FORCEINLINE wide<T, N>
+                perform_load(logical<wide<T, N>> mask, as<wide<T, N>> tgt, PtrTy p)
+{
+  auto           zero_init = make(as<wide<T, N>> {}, static_cast<T>(0));


wide<T, N> zero_init{0};

or

auto zero_init = eve::zero(tgt);

DenisYaroshevskiy

I skimmed a bit of the doc, will look more. In the mean time, I really think that heavy macro usage is not helpful. Let's try to figure out how we can do it less.

include/eve/arch/riscv/as_register.hpp

DenisYaroshevskiy

I will do a deeper dive later - there is still a lot. Thank you for cleaning up macros. I think there are some simplifications that can be made.

FYI: we ususally don't enable all tests for new arch in one go. We do them from internal.exe, than core.exe and then add more targets. You don't need to change for this pr, just fyi.

FYI2: in my comments i went from bottom to top - (if it seems confusing).

DenisYaroshevskiy · 2024-01-09T21:07:21Z

test/unit/module/algo/algorithm/min_element_special_cases.cpp

      }
    }
  }
+  else { TTS_PASS("For RISC-V uint8 not enough to store element index."); }


Let's instead on rvv just pass cardinal explicitly.

auto alg0 = eve::algo::min_element // [eve::algo::single_pass] // [eve::algo::index_type<std::uint8_t>] // [eve::algo::unroll<2>]; auto alg = [&]{ if constexpr( eve::expected_cardinal_v<std::uint8_t> < 128) { return alg0; } else { return alg0[eve::algo::force_cardinal<64>]; } }();

DenisYaroshevskiy · 2024-01-09T21:08:10Z

test/unit/module/algo/algorithm/min_element_special_cases.cpp

@@ -17,68 +17,79 @@ TTS_CASE("Min element one pass, uint8 index")
      [eve::algo::single_pass]              //
      [eve::algo::index_type<std::uint8_t>] //
      [eve::algo::unroll<2>];
+  if constexpr( eve::current_api != eve::rvv )


same here as below suggested

DenisYaroshevskiy · 2024-01-09T21:09:51Z

test/unit/internals/category.cpp

+    if constexpr (eve::current_api != eve::rvv) {
+      TTS_CONSTEXPR_EXPECT(match(eve::detail::categorize<T>(), lanes));
+      TTS_CONSTEXPR_EQUAL(eve::detail::categorize<T>(), uint8 | lanes);
+    }


@jfalcou - I don't understand this test. What should be done here.

test/unit/api/regular/wide.cpp

DenisYaroshevskiy · 2024-01-09T21:15:13Z

include/eve/module/core/regular/impl/simd/riscv/slide_right.hpp

+//==================================================================================================
+#pragma once
+
+namespace eve::detail


I would expect the emulation code to work here. What happened.

Well, for me this is not clear too, I just checked that arm sve do the same thing. Moreover, I actually had tests that fails if I'm not implement slide_left/right, for example unit.api.regular.swizzle.slide_left.

DenisYaroshevskiy · 2024-01-09T22:23:12Z

include/eve/module/core/regular/impl/simd/riscv/slide_left.hpp

+{
+template<scalar_value T, typename N, std::ptrdiff_t Shift>
+EVE_FORCEINLINE auto
+slide_left_(EVE_SUPPORTS(rvv_), logical<wide<T, N>> v, index_t<Shift>) noexcept


again, emulation should do this, why doesn't that work

DenisYaroshevskiy · 2024-01-09T22:25:54Z

include/eve/module/core/regular/impl/simd/riscv/logical_xor.hpp

+                             logical<wide<U, N>> const                &b) noexcept
+requires rvv_abi<abi_t<T, N>>
+{
+  return self_neq(a, bit_cast(b, as<logical<wide<T, N>>> {}));


a != bit_cast(b, as(a));

DenisYaroshevskiy · 2024-01-09T22:28:18Z

include/eve/module/core/regular/impl/simd/riscv/last_true.hpp

+    logical<wide<T, N>> masked = __riscv_vmand(v0.storage, m, N::value);
+    return last_true(masked);
+  }
+}


are you sure you need this one.

DenisYaroshevskiy · 2024-01-09T22:31:12Z

include/eve/module/core/regular/impl/simd/riscv/last_true.hpp

+    if( v0.get(i) ) return i;
+  }
+  return {};
+}


I am not wild about doing this tbh. Is there really no good solution in the isa? should be smth easy-ish.

You can iota + mask + maximum for example.

Actually, I suppose we can be faster, but I would prefer to make it in different MR, as this will require some investigation.

For example, I was thinking about casting it to several u64, and find it there. Unfortunately, I could not find in RVV ISA instruction that allows me to find position of the last set bit.

DenisYaroshevskiy · 2024-01-09T22:38:47Z

include/eve/module/core/regular/impl/simd/riscv/bit_notand.hpp

+  auto bitnot_res = self_bitnot(v0_copy);
+  return self_bitand(bitnot_res, v1);
+}
+}


I suspect this file can be deleted

Yep, removed

DenisYaroshevskiy

Looked a bit more.

I am so far not convinced about riscv_rvv_dyn_ and using len multipliers in code. Let's first clean up everything else and then come back to this

DenisYaroshevskiy · 2024-01-14T12:27:14Z

include/eve/arch/riscv/top_bits.hpp

+
+    to_return &= to_clean.to_ullong();
+    return to_return;
+  }


oof. not amazing. But I guess you can't do anything. Maybe we can just completely not have this method.

Well, actually, this method is turned off almost for every logical in RISC-V

DenisYaroshevskiy · 2024-01-14T12:29:30Z

include/eve/detail/function/simd/riscv/bit_cast.hpp

+    if constexpr( out_lmul == 4 ) return __riscv_vlmul_ext_u8m4(a);
+    if constexpr( out_lmul == 8 ) return __riscv_vlmul_ext_u8m8(a);
+  }
+}


On a surface level, I don't think that extract should be usued in bit_cast

This is an "extend" operation, I use it when underlying type needs to be widened to support expected type. For example, for "char" type we support using 1/8 part of real vector register (called mf8), but when you cast it to "int", you can at least use 1/2 part of register(mf2).

This is documented here: Chapter 7. Type system of https://github.com/riscv-non-isa/rvv-intrinsic-doc

if they are treated as different sizes, bitcasting them wouldn't be correct.

You can't bitcast xmm into ymm in our system, for example.

If every cardinal is fundamental, why are you trying to do this?

, for "char" type we support using 1/8 part of real vector register (called mf8), but when you cast it to "int", you can at least use 1/2 part of register(mf2).

I'm not sure I understand what's happening here.

4 bytes is 4 bytes. Both would be

std::max(1 / 8, 4 / VLEN) * VLEN

As rvv intrinsics for RISC-V usually do not know about vlen, they should fit for any available vlen. Because of this fact, some types that are correspond for fractional usage for real vector register are unavailable.

For example, you can not fit int with usage 1/4 of vector register of vlen=64. So in rvv intrinsics interface there are no types that could be used to represent usage 1/4 of vector register with the element type of int.

However, for char you can fit at least one element with usage 1/8 of vector register even if vlen is 64. So type that corresponds to usage of 1/8 of vector register with element type of char is available.

In current review I try to use RISC-V vector registers in the most effective way, so I use the smallest possible types for each wide, that can result into different sizes for underlying types, even if sizeof(T1)*N1==sizeof(T2)*N2 because of described above.

DenisYaroshevskiy · 2024-01-14T12:33:58Z

include/eve/detail/function/simd/riscv/bit_cast.hpp

+  if constexpr( is_aggregated_v<abi_t<T, N>> || is_aggregated_v<abi_t<U, M>> ) static_assert(false);
+  if constexpr( is_aggregated_v<abi_t<T, typename N::combined_type>>
+                || is_aggregated_v<abi_t<U, M>> )
+    static_assert(false);


aggregated won't be here, because it is not rvv_abi.

also if sizeof(T) * N == sizeof(U) * M - should work.

Yep, removed

DenisYaroshevskiy · 2024-01-14T12:34:40Z

include/eve/detail/function/simd/riscv/bit_cast.hpp

+EVE_FORCEINLINE wide<U, M>
+bit_cast_(EVE_SUPPORTS(rvv_), wide<T, N> const& x, as<wide<U, M>> const& to_as) noexcept
+requires rvv_abi<abi_t<T, N>> && rvv_abi<abi_t<U, M>> && same_wide_size<T, N, U, M>
+         && (sizeof(T) * N::value > sizeof(U) * M::value)


You can only bit_cast if sizeof(T) * N::value == sizeof(U) * M::value - this overload is invalid

DenisYaroshevskiy · 2024-01-14T12:37:24Z

include/eve/detail/function/simd/riscv/bitmask.hpp

+to_mask(rvv_ const&, logical<wide<T, N>> p) noexcept
+{
+  return bit_cast(p.bits(), as<typename logical<wide<T, N>>::mask_type> {});
+}


Why this overload when the previous overload exists? It should be one or the other, unless I am missing something.

DenisYaroshevskiy · 2024-01-14T12:42:55Z

include/eve/detail/function/simd/riscv/compounds.hpp

+  {
+    return self = __riscv_vmul(self, static_cast<T>(other), N::value);
+  }
+  if constexpr( match(c, category::float_) )


you can just else for 2 cases, It's fine.

DenisYaroshevskiy · 2024-01-14T12:43:35Z

include/eve/detail/function/simd/riscv/compounds.hpp

+
+RVV_BIT(self_bitand, __riscv_vand)
+RVV_BIT(self_bitxor, __riscv_vxor)
+RVV_BIT(self_bitor, __riscv_vor)


this macro gotta go

DenisYaroshevskiy · 2024-01-14T12:45:32Z

include/eve/detail/function/simd/riscv/compounds.hpp

+    i_t            to_cast_res = __riscv_vnot(self_cast, N::value);
+    self                       = bit_cast(to_cast_res, as(self));
+    return self;
+  }


if constepxr (match(cat, unsigned_)) { __riscv_vnot } else { // to unsigned and recurse }

DenisYaroshevskiy · 2024-01-14T12:46:19Z

include/eve/detail/function/simd/riscv/compounds.hpp

+  constexpr auto c = categorize<wide<T, N>>();
+  if constexpr( match(c, category::uint_) )
+  {
+    auto shift_casted = convert(shift, as<as_integer_t<U, unsigned>>());


this is the same between branches

Moved upper

DenisYaroshevskiy · 2024-01-14T12:47:06Z

include/eve/detail/function/simd/riscv/friends.hpp

+RVV_LOGIC(self_geq, __riscv_vmsge, __riscv_vmsgeu, __riscv_vmfge)
+RVV_LOGIC(self_leq, __riscv_vmsle, __riscv_vmsleu, __riscv_vmfle)
+RVV_LOGIC(self_eq, __riscv_vmseq, __riscv_vmseq, __riscv_vmfeq)
+RVV_LOGIC(self_neq, __riscv_vmsne, __riscv_vmsne, __riscv_vmfne)


please no macros like this. We can figure out template to do this if needed.

ita-sc · 2024-08-14T13:17:32Z

Hi again
Sorry for long delay on this review: actually I was waiting when LLVM will support the necessary functionality for this patch to be usable without additional modifications.
Though I tried to address all review comments, I suggest to start this review over.

DenisYaroshevskiy · 2024-08-14T13:55:23Z

Welcome back, glad to have you. Will do as the last time I guess, piecemeal. I will do a pass - we work through it, I do another pass. I unfortunately can't do all at once - this is a very big piece of work.

DenisYaroshevskiy · 2024-08-14T14:06:32Z

include/eve/arch/riscv/as_register.hpp

+
+  static consteval auto find_vint_mf8()
+  {
+    constexpr size_t bin_size = sizeof(Type) * 8;


we do std::size_t

DenisYaroshevskiy · 2024-08-14T23:02:14Z

test/unit/module/algo/algorithm/transform_keep_if_and_remove_generic_test.hpp

@@ -24,7 +24,7 @@ template <typename T, typename Algo, typename Op>
 void transform_keep_if_generic_test_aligned_ptr(eve::as<T>, Algo alg, Op op)
 {
  using e_t = eve::element_type_t<T>;
-  alignas(64) std::array<e_t, 23> data;


Why? Using doyble cardinal?

Because later we have eve::as_aligned(data.begin()) without cardinal, so it uses default cardinal, that is for short type is (128/16)*8=64. (the *8 appears as at maximum, we use 8 real vector registers), that will result in alignment 64 * 2 = 128 byte.

This could be also fixed if I set eve::as_aligned(data.begin(), eve::fixed<8>{}) (or whatever value to be sure that it will not be greater that 64)

Unfortunately, if you use just 8 * register size, you are not testing aggregation.
As much as it pains me to say this, you need to go 16 * register size. So 256.

DenisYaroshevskiy · 2024-08-15T08:27:31Z

include/eve/arch/riscv/top_bits.hpp

+
+  EVE_FORCEINLINE constexpr explicit operator bool()
+  {
+    return __riscv_vcpop(storage, static_size) > 0;


DenisYaroshevskiy · 2024-08-15T08:29:34Z

include/eve/detail/function/bit_cast.hpp

+    {
+      return EVE_DISPATCH_CALL(a, tgt);
+    }
+#endif


This doesn't make any sense to me. bitcast - means specifically to copy to the same size. That what it does.

What is the problem you are solving.

UPD: I think I see - you want to bitcast between a smaller and larger cardinals. But, I guess, they come out as different size.

Why do you need to bitcast between them? If the spec says we have all those small registers, why not keep them?

Why do you need to bitcast between them?

There are two main reasons.

First and the main one is when I need to bitcast between, for example, wide<char, 4> and wide<int, 1>. They have the same sizeof(T) * N, however for RISC-V underlying type sizes will be different:

wide<char, 4> could be represented as a part of real vector register in RISC-V vector intrinsics (for vlen=128 this will be 1/4 of real vector register).

For wide<int, 1> we have an ability to represent it with vlen=128 as 1/4 of vector register. However, RISC-V vector intrinsics have some restrictions: they are unable to set LMUL to MF4 (1/4) if elements are of type int. It seems that this restriction comes from the fact that vector intrinsics are expected to work on any expected (by intrinsics) vector unit configuration. For example, if vlen=64 int will not fit into 1/4 part of vector. So, in intrinsics interface we need to use the type that corresponds to at least 1/2 of real vector register.

That is why I need to "increase" LMUL, so underlying sizes will be different.

The second one is that sometimes I need an ability to cast between wide with different cardinal sizes, but the same types (for example test\test.h). In EVE it seems you usually just get a .storage(), and construct from it a new wide with increased cardinal. For RISC-V this is unavailable (as underlying types may be different), so for now I've decided to make bit_cast do this thing too. If there is better approach for this, I can try to refactor this.

I see - what you are saying is that bit_casting from wide<char, 4> to wide<int, 1> should work, regardless of the type.

I'd agree with that - but this requires is very wrong. It'd have to say "sizeof(T) * N1::value == sizeof(U) * N2::value".

I'll have a look this weekend to make that change for you in master.

Thanks.

Also, how should bit_cast be used on logicals?

I've used mask types (i.e. a pack of bits) as logicals for RISC-V. Is bit_cast from wide<char, 1> to logical<wide<int, 8>> allowed (since sizes are equal)?

I ask since for some functions none of the overloads suited RISC-V (so I have written RISC-V specific ones).
E.G.:

eve/include/eve/detail/function/simd/common/bitmask.hpp

Line 50 in c1f74e3

return bit_cast(p, as<type>{});

Casts between eve::logical<eve::wide<T, eve::fixed<N>>> and eve::wide<T, eve::fixed<N>>>, but for RISC-V these types differ in length.

DenisYaroshevskiy · 2024-08-15T08:30:51Z

include/eve/detail/function/simd/riscv/bit_cast.hpp

+    if constexpr( out_lmul == 2 ) return __riscv_vlmul_ext_f64m2(a);
+    if constexpr( out_lmul == 4 ) return __riscv_vlmul_ext_f64m4(a);
+    if constexpr( out_lmul == 8 ) return __riscv_vlmul_ext_f64m8(a);
+  }


we do else after return in if constexpr chains.

Yep, I thought about it, but for RISC-V this will made a huge chain of if constexpr (...) ... else if constexpr (...) ... else if constexpr (...) ... that will make code less readable and understandable. I try to return each if statement, so while debugging problems it will be easier to find problem

We have those huge chains, this is how we do it

DenisYaroshevskiy · 2024-08-15T08:41:27Z

include/eve/detail/function/simd/riscv/bit_cast.hpp

+    if constexpr( out_lmul == 4 ) return __riscv_vlmul_ext_u8m4(a);
+    if constexpr( out_lmul == 8 ) return __riscv_vlmul_ext_u8m8(a);
+  }
+}


if they are treated as different sizes, bitcasting them wouldn't be correct.

You can't bitcast xmm into ymm in our system, for example.

If every cardinal is fundamental, why are you trying to do this?

DenisYaroshevskiy · 2024-08-15T08:44:30Z

include/eve/detail/function/simd/riscv/bitmask.hpp

+                                 /* True value */ static_cast<int_type>(-1),
+                                 /* False Value */ static_cast<int_type>(0));
+  return if_else_res;
+}


What's true on risk-v5? is it not ffff?

In terms of RISC-V you have mask type (actually, this is just a set of operations, no separate registers), that consist of separate bits, n-th bit refers to n-th element position, and true/false refer to value of bit. This mask can be used in operations to mark corresponding "lane", or element, perform operation or in this operation this element should be ignored.

There you return something corresponding to original mask, and I just copied what returns by arm sve, from my point of view I just need to distinguish between false and true.

this doesn't strike me as a good idea.
sve doesn't do top_bits very well.

You want something (ideally an integer) that is all 1111s for true and all 0s for false.

Can risk-v do that

This is exactly what is done. I return wide<T, N>, where T is unsigned with appropriate size, filled with a 1 if corresponding bit of the logical was set, otherwise it will have 0's.

Representing logical as a wide<T,N> with all ones for true and all zeroes for false will be ineffective for several reasons:

before a mask operation I will need to cast it to the real mask type, which will be time-consuming

I will need to use much more vector registers for keeping this logical (at most 8 instead of always 1 for real mask type).

DenisYaroshevskiy · 2024-08-15T08:45:10Z

include/eve/detail/function/simd/riscv/combine.hpp

+    that_t         wider_h_placed = __riscv_vslideup(wider_h, wider_h, shift_size, combined_vl);
+    auto           mask_all_ones  = rvv_true<T, N>();
+    auto           wider_mask     = bit_cast(mask_all_ones, as<logical<that_t>> {});
+    return if_else(wider_mask, wider_l, wider_h_placed);


DenisYaroshevskiy · 2024-08-15T08:45:25Z

include/eve/detail/function/simd/riscv/combine.hpp

+
+    return that;
+  }
+  else { return to_logical(eve::combine(l.mask(), h.mask())); }


DenisYaroshevskiy · 2024-08-15T08:50:48Z

include/eve/detail/function/simd/riscv/compounds.hpp

+  if constexpr( match(c, category::integer_) ) return self = __riscv_vmul(self, y, N::value);
+  if constexpr( match(c, category::float_) ) return self = __riscv_vfmul(self, y, N::value);
+}
+template<plain_scalar_value T, value U, typename N>


Please new lines between functions.

Also

// self_mult -------------------- // self_add --------------------

This file blurs together a bit.

DenisYaroshevskiy

I'll be honest, this'll go much better if you split the pull request.
I'm not gonna insist - I really appreciate you putting in work.
But I'm also not going to let through things just because it's a big review, we care for the project.

We usually for a new platform, just make some targets build and then add more.

How would you feel about splitting out unit.arch.exe? unit.internals.exe? unit.memory.exe? Then we do some chunk and do more in the next bit.

DenisYaroshevskiy · 2024-08-15T23:30:42Z

include/eve/detail/function/bit_cast.hpp

+    {
+      return EVE_DISPATCH_CALL(a, tgt);
+    }
+#endif


I see - what you are saying is that bit_casting from wide<char, 4> to wide<int, 1> should work, regardless of the type.

I'd agree with that - but this requires is very wrong. It'd have to say "sizeof(T) * N1::value == sizeof(U) * N2::value".

I'll have a look this weekend to make that change for you in master.

DenisYaroshevskiy · 2024-08-15T23:32:02Z

include/eve/detail/function/simd/riscv/bitmask.hpp

+                                 /* True value */ static_cast<int_type>(-1),
+                                 /* False Value */ static_cast<int_type>(0));
+  return if_else_res;
+}


this doesn't strike me as a good idea.
sve doesn't do top_bits very well.

You want something (ideally an integer) that is all 1111s for true and all 0s for false.

Can risk-v do that

DenisYaroshevskiy · 2024-08-17T14:48:57Z

Is there any reason to use vlen > 1 over us just unrolling it manually?

With this patch unit.arch tests are fully passed.

With this patch all unit.internals tests are passed for RISC-V.

With this patch all unit.core tests passed for RISC-V.

As RISC-V default cardinal for some types bigger that 8, this test earlier resulted in reading non-initialized data. With this patch all tests from unit.memory are passed for RISC-V with vlen=128.

With this patch, all tests form examples are passed for RISC-V with vlen=128.

With this patch all unit.api testing is passed. Test that requires constuction wide from storage for riscv is removed, as by default we can not construct wide with different cardinal type with the same underlying type.

With this patch all unit.algo tests are passed for RISC-V.

ita-sc · 2024-08-17T15:53:32Z

I'll be honest, this'll go much better if you split the pull request.

I've split patch into several one. Should I create a separate review for first commit?

DenisYaroshevskiy · 2024-08-17T16:12:11Z

Yeah - please separate review.

DenisYaroshevskiy · 2024-08-29T10:52:59Z

test/unit/module/algo/algorithm/transform_keep_if_and_remove_generic_test.hpp

@@ -24,7 +24,7 @@ template <typename T, typename Algo, typename Op>
 void transform_keep_if_generic_test_aligned_ptr(eve::as<T>, Algo alg, Op op)
 {
  using e_t = eve::element_type_t<T>;
-  alignas(64) std::array<e_t, 23> data;


Unfortunately, if you use just 8 * register size, you are not testing aggregation.
As much as it pains me to say this, you need to go 16 * register size. So 256.

DenisYaroshevskiy · 2024-08-29T11:00:10Z

test/unit/memory/load/tuple.cpp

                              , &data1[idx1] - 1
-                              , eve::as_aligned(&data2[idx2],typename w8_t::cardinal_type{})


I don't understand you. w8_t is 8 elements. This verifies that you can load 8 elements. Not 16.

What fails and how.

DenisYaroshevskiy · 2024-08-29T11:01:43Z

test/unit/arch/top_bits.cpp

@@ -38,12 +38,15 @@ TTS_CASE_TPL( "Check top bits raw type", eve::test::simd::all_types)
 {
  using v_t         = eve::element_type_t<T>;
  using logical     = eve::logical<T>;
+  using c_t              = eve::cardinal_t<T>;
+  using rvv_logical_type = eve::logical<eve::wide<v_t, c_t>>;


This is just logical

DenisYaroshevskiy · 2024-08-29T11:03:53Z

test/unit/api/regular/wide.cpp

@@ -32,7 +32,8 @@ TTS_CASE_WITH( "Check eve::wide enumerating constructor"

  // Test smaller size wide for non-garbage
  using v_t = typename T::value_type;
-  if constexpr( T::size() < eve::fundamental_cardinal_v<v_t> && !eve::has_emulated_abi_v<T> )
+  if constexpr( T::size() < eve::fundamental_cardinal_v<v_t> && !eve::has_emulated_abi_v<T>
+                && eve::current_api != eve::rvv )


What doesn't work for rvv here. This seems like it should.

DenisYaroshevskiy · 2024-08-29T11:06:47Z

include/eve/detail/shuffle_v2/simd/riscv/shuffle_l2.hpp

+      ids[id]                = P::idxs[id_from_pattern] * G + id % G;
+    }
+    wide<as_integer_t<T, unsigned>, N> offsets(ids);
+    wide<T, N>                         res = __riscv_vrgather_tu(x, x, offsets.storage(), N::value);


Gather is level 3 - it needs a mask.

Here is how you do it:

eve/include/eve/detail/shuffle_v2/simd/arm/sve/shuffle_l3.hpp

Line 30 in 5c21a5b

shuffle_l3_svetbl(P, fixed<G>, wide<T, N> x)

Let me know if that doesn't make sense, and I'll help more.

ita-sc changed the title ~~Init support for RISCV vector extension~~ Initial support for RISCV vector extension Dec 28, 2023

jfalcou reviewed Dec 28, 2023

View reviewed changes

include/eve/arch/riscv/tags.hpp Show resolved Hide resolved

jfalcou reviewed Dec 28, 2023

View reviewed changes

DenisYaroshevskiy reviewed Dec 28, 2023

View reviewed changes

DenisYaroshevskiy reviewed Dec 29, 2023

View reviewed changes

DenisYaroshevskiy reviewed Dec 30, 2023

View reviewed changes

include/eve/detail/function/simd/riscv/compounds.hpp Show resolved Hide resolved

DenisYaroshevskiy reviewed Dec 30, 2023

View reviewed changes

DenisYaroshevskiy reviewed Jan 1, 2024

View reviewed changes

include/eve/arch/riscv/as_register.hpp Show resolved Hide resolved

DenisYaroshevskiy reviewed Jan 9, 2024

View reviewed changes

DenisYaroshevskiy reviewed Jan 14, 2024

View reviewed changes

ita-sc force-pushed the ita-sc/riscv-backend branch from e9e7fe8 to 492c84d Compare August 14, 2024 13:16

DenisYaroshevskiy reviewed Aug 15, 2024

View reviewed changes

ita-sc added 8 commits August 17, 2024 18:47

[riscv] Add support for RISC-V vector extension in arch

b17e2b3

With this patch unit.arch tests are fully passed.

[riscv] Add support for unit.internals testing

d4092f5

With this patch all unit.internals tests are passed for RISC-V.

[riscv] Add support unit.core testing

666cbfe

With this patch all unit.core tests passed for RISC-V.

[riscv] Fix test unit.memory.load.tuple for RISC-V

dc3687e

As RISC-V default cardinal for some types bigger that 8, this test earlier resulted in reading non-initialized data. With this patch all tests from unit.memory are passed for RISC-V with vlen=128.

[riscv] Impelement right variant of first_true

6b38369

With this patch, all tests form examples are passed for RISC-V with vlen=128.

[riscv] Add support for unit.api tests for RISC-V

707f5f3

With this patch all unit.api testing is passed. Test that requires constuction wide from storage for riscv is removed, as by default we can not construct wide with different cardinal type with the same underlying type.

[riscv] Add support for testing RISC-V for unit.algo testsuite

979fa91

With this patch all unit.algo tests are passed for RISC-V.

[riscv] Optimizations for RISC-V

af4327d

ita-sc force-pushed the ita-sc/riscv-backend branch from 39ebed7 to af4327d Compare August 17, 2024 15:49

ita-sc mentioned this pull request Aug 17, 2024

[riscv] Add support for RISC-V vector extension in arch #1920

Closed

Merge branch 'jfalcou:main' into ita-sc/riscv-backend

db76657

DenisYaroshevskiy reviewed Aug 29, 2024

View reviewed changes

ita-sc marked this pull request as draft September 7, 2024 06:17

		, &data1[idx1] - 1
		, eve::as_aligned(&data2[idx2],typename w8_t::cardinal_type{})

Initial support for RISCV vector extension #1716

Are you sure you want to change the base?

Initial support for RISCV vector extension #1716

Conversation

ita-sc commented Dec 28, 2023

jfalcou commented Dec 28, 2023

Choose a reason for hiding this comment

DenisYaroshevskiy left a comment

Choose a reason for hiding this comment

DenisYaroshevskiy Dec 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenisYaroshevskiy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenisYaroshevskiy commented Dec 29, 2023

ita-sc commented Dec 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenisYaroshevskiy left a comment

Choose a reason for hiding this comment

DenisYaroshevskiy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenisYaroshevskiy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenisYaroshevskiy Aug 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenisYaroshevskiy Dec 28, 2023 •

edited

Loading

DenisYaroshevskiy left a comment •

edited

Loading

DenisYaroshevskiy Aug 17, 2024 •

edited

Loading