Enable SVE Support for L2 Metric Computation in FP32 #969

adarshs1310 · 2024-11-29T10:34:36Z

Description:
This PR introduces SVE (Scalable Vector Extension) enablement for L2 metric computation in FP32. The changes enhance performance for most indexing methods compared to NEON, with observed speed-ups across multiple algorithms.

Changes in This PR:

Added SVE optimizations for L2 metric computation in FP32.
Updated CMakeLists.txt to include support for -march=armv8-a+sve.
Refactored compute kernels to leverage SVE intrinsics.

Benchmark Results:

Performance benchmarks(32 vcpus) were conducted using both NEON and SVE on ARM architecture. Below are the results showcasing execution times (in seconds):

Key observations:

All of the algorithms exhibit performance gains with SVE when compared with NEON.

Note: SVE support has been made optional, as not all functions have been fully enabled yet. To utilize SVE, please compile with -march=armv8-a+sve.

/kind feature
Fixes #782

sre-ci-robot · 2024-11-29T10:34:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adarshs1310
To complete the pull request process, please assign zhengbuqian after the PR has been reviewed.
You can assign the PR to them by writing /assign @zhengbuqian in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2024-11-29T10:34:46Z

Welcome @adarshs1310! It looks like this is your first PR to zilliztech/knowhere 🎉

mergify · 2024-11-29T10:35:13Z

@adarshs1310 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

If you're fixing a bug, label it as kind/bug.
For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

alexanderguzhva · 2024-11-29T22:00:37Z

src/simd/distances_sve.cc

+    svbool_t pg = svptrue_b32();
+
+    while (i < d) {
+        if (d - i < svcntw())


I believe that this if condition is not needed, just pg = svwhilelt_b32(i, d); should be sufficient

Thank you so much, @alexanderguzhva , for the valuable suggestions. During development, we considered this approach as well, and our reason for going with the current approach is as follows:

Using the if condition to update pg only in the last iteration avoids unnecessary updates and reduces the dependency chain introduced by the svwhilelt instruction. This optimization minimizes stalls caused by these dependencies, allowing the processor pipeline to operate more efficiently.

alexanderguzhva · 2024-11-29T22:09:07Z

cmake/libs/libfaiss.cmake

@@ -48,7 +48,7 @@ endif()

 if(__AARCH64)
  set(UTILS_SRC src/simd/hook.cc src/simd/distances_ref.cc
-                src/simd/distances_neon.cc)
+                src/simd/distances_neon.cc src/simd/distances_sve.cc)


I believe that this is not sufficient.
Knowhere is designed as a library that picks function pointers according to CPU capabilities, detected upon the start.
For example, SSE / AVX2 / AVX512 code files have different corresponding compile options

knowhere/cmake/libs/libfaiss.cmake

Lines 37 to 40 in 1cb9937

target_compile_options(utils_sse PRIVATE -msse4.2 -mpopcnt)

target_compile_options(utils_avx PRIVATE -mfma -mf16c -mavx2 -mpopcnt)

target_compile_options(utils_avx512 PRIVATE -mfma -mf16c -mavx512f -mavx512dq

-mavx512bw -mpopcnt -mavx512vl)

So, it seems to be logical that the new SVE code should also contain some form of different flags, such as -march=armv8-a+sve for distances_sve.cc. And I don't believe that I see this.

Thank you for your valuable feedback @alexanderguzhva

We considered two approaches: first, to keep Neon as the default until all FP32 functions for SVE have been fully implemented. Based on this, we decided to proceed with this approach.

Your approach is absolutely correct, and we agree that the Neon fallback for non-SVE functions can be effectively handled using hook.cc.

We have incorporated these changes into our solution.

adarshs1310 · 2024-11-30T07:42:23Z

@alexanderguzhva @foxspy @hhy3

Requesting updates to the CI pipeline to address GCC header file conflicts.

Since Ubuntu 22.04 defaults to GCC-11, there is a risk of incorrect header usage when GCC-12 is installed. To resolve this, the GCC-11 header files should be removed after installing GCC-12.

The necessary changes have been incorporated into the ARM-based Ubuntu 22.04 Docker configuration as part of this PR, ensuring proper SVE compatibility. Kindly review and consider these adjustments for consistent and accurate builds. Thank you!

adarshs1310 · 2024-11-30T10:49:44Z

/kind feature

codecov · 2024-12-02T16:22:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.01%. Comparing base (3c46f4c) to head (da98fcf).
Report is 297 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##           main     #969       +/-   ##
=========================================
+ Coverage      0   73.01%   +73.01%     
=========================================
  Files         0       82       +82     
  Lines         0     7507     +7507     
=========================================
+ Hits          0     5481     +5481     
- Misses        0     2026     +2026

see 82 files with indirect coverage changes

alexanderguzhva · 2024-12-03T15:07:50Z

@adarshs1310 would you please rebase on top of master? it contains a fix for UT. Thanks.

adarshs1310 · 2024-12-03T15:24:18Z

Sure @alexanderguzhva! The rebase has been done now!

adarshs1310 · 2024-12-09T07:08:44Z

Hi @alexanderguzhva , @foxspy , @hhy3 ,

Could you please look into the SSE fix when you have a chance? The ARM CI issue has already been handled in our code.

Thank you so much for your support!

foxspy · 2024-12-09T15:50:06Z

Hi @alexanderguzhva , @foxspy , @hhy3 ,

Could you please look into the SSE fix when you have a chance? The ARM CI issue has already been handled in our code.

Thank you so much for your support!

SSE's CI will not block; but ARM's CI fails

alexanderguzhva · 2024-12-09T17:36:30Z

@adarshs1310 I was able to compile and run knowhere on AWS Graviton 3 using GCC-12. I see the following error during unit tests, which seems to be a minor precision issue. Still, would you be able to find the root of this problem?

-------------------------------------------------------------------------------
Test Brute Force with input ids
-------------------------------------------------------------------------------
/home/ubuntu/zilliz/knowhere_sve/knowhere/tests/ut/test_bruteforce.cc:201
...............................................................................

/home/ubuntu/zilliz/knowhere_sve/knowhere/tests/ut/test_bruteforce.cc:243: FAILED:
  REQUIRE( gt_dis[i] == dis[i] )
with expansion:
  156769.3125f == 156769.32812f

Other unit tests pass.

The compilation is the following (given that you have created a profile for GCC 12 for conan):

mkdir build
cd build
conan install .. --build=missing -o with_diskann=True -o with_ut=True -o with_benchmark=True -s compiler.libcxx=libstdc++11 -c tools.build:cxxflags+=[\"-mcpu=neoverse-512tvb\",\"-march=native\"] -s build_type=Release --profile=gcc12
conan build ..

adarshs1310 · 2024-12-17T05:22:02Z

Rebase has been done

@foxspy @alexanderguzhva @hhy3

alexanderguzhva · 2024-12-17T15:29:59Z

src/simd/distances_sve.cc

+#include <cmath>
+
+#include "faiss/impl/platform_macros.h"
+#pragma GCC optimize("O3,fast-math,inline")


@adarshs1310 Could you please just remove this hacky pragma?
Thanks.

Sure @alexanderguzhva I have removed it. Thanks!

alexanderguzhva · 2024-12-18T20:34:22Z

lgtm

adarshs1310 · 2024-12-19T13:53:42Z

Hi @foxspy and @hhy3! Can we Please get an update on this?

Thank you so much!

Presburger · 2024-12-20T15:34:24Z

At this stage, I am not inclined to accept this PR, as it would require us to maintain two sets of ARM binaries when releasing binaries or containers. At this stage, I am not inclined to accept this PR, as it would require us to maintain two sets of ARM binaries when releasing binaries or containers.

adarshs1310 · 2025-01-06T07:55:41Z

At this stage, I am not inclined to accept this PR, as it would require us to maintain two sets of ARM binaries when releasing binaries or containers. At this stage, I am not inclined to accept this PR, as it would require us to maintain two sets of ARM binaries when releasing binaries or containers.

Thank you, @Presburger , for sharing this! We have now implemented a supports_sve() function that dynamically checks if the machine supports SVE at runtime. This approach eliminates the need to maintain separate binaries for ARM architectures.

adarshs1310 · 2025-01-09T06:01:10Z

Hi @Presburger @foxspy @alexanderguzhva @hhy3!

I am following up regarding the recent updates to this PR. As noted, we have implemented the supports_sve() function to dynamically verify SVE support at runtime hereby removing the need to maintain two binaries for arm.

In addition, we have been making further progress and remain committed to improving the Knowhere ecosystem, particularly by optimizing performance. We will continue to contribute actively to this project.

Could you kindly review the latest changes and let us know if this is ready to proceed?

Your feedback and guidance are highly valued.

Presburger · 2025-01-16T07:48:32Z

@adarshs1310 Thank you very much for your contribution. Could you please test on a CPU that does not support SVE to see if there are any illegal instruction issues? Additionally, when dynamic selection is supported, is the performance improvement still significant? Thank you very much for your work.

Signed-off-by:Adarsh Srivastava <[email protected]> Signed-off-by: Adarsh Srivastava <[email protected]>

adarshs1310 · 2025-01-20T09:56:48Z

Hi @Presburger!

Thank you for your feedback and for pointing this out. We’ve tested the latest commit on a CPU that does not support SVE, and there are no illegal instruction issues on the NEON machine (m6g.16xlarge). Additionally, there is still a significant performance boost even when dynamic selection is supported. All test cases are passing successfully on the m7g.16xlarge instance.

However, on the main branch, in the test_simd function with the disabled BF16 patch, the threshold for validation is very strict, occasionally leading to test failures with a value difference of less than 0.1. It might be worth revisiting the tolerance in this case.

Your feedback is highly valued, and we look forward to continuing our contributions together.

adarshs1310 · 2025-01-23T05:18:18Z

Hi @Presburger @alexanderguzhva @foxspy @hhy3 ,

Just following up to check if our PR looks good to you or if there’s anything that needs to be addressed.

Presburger · 2025-01-24T06:12:11Z

src/simd/hook.cc

+        fvec_inner_product_batch_4 = fvec_inner_product_batch_4_neon;
+        fvec_L2sqr_batch_4 = fvec_L2sqr_batch_4_neon;
+
+        fp16_vec_inner_product_batch_4 = fp16_vec_inner_product_batch_4_neon;


By the way, please also implement the batch4 for the IP sve.

Presburger · 2025-01-24T06:17:44Z

src/simd/distances_sve.cc

+    svbool_t pg = svptrue_b32();
+
+    while (i < n) {
+        if (n - i < svcntw())


The handling of the tail block here could be more elegant, as SVE's predicate registers are inherently friendly to out-of-bounds memory access.

Presburger · 2025-01-24T06:19:03Z

cmake/libs/libfaiss.cmake

+
+  set(UTILS_SRC src/simd/distances_ref.cc src/simd/distances_neon.cc)
+  set(UTILS_SVE_SRC src/simd/hook.cc src/simd/distances_sve.cc)
+  set(ALL_UTILS_SRC ${UTILS_SRC} ${UTILS_SVE_SRC})


You can directly place the SVE source file into the UTILS_SRC variable.

sre-ci-robot requested review from foxspy and hhy3 November 29, 2024 10:34

sre-ci-robot added the size/L label Nov 29, 2024

mergify bot added the needs-dco label Nov 29, 2024

mergify bot added the do-not-merge/missing-related-issue label Nov 29, 2024

adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from 1783ca6 to 4186e1a Compare November 29, 2024 11:31

mergify bot added dco-passed and removed needs-dco labels Nov 29, 2024

adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from e304e26 to fe28988 Compare November 29, 2024 12:29

alexanderguzhva reviewed Nov 29, 2024

View reviewed changes

adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from 97b3ee0 to dde2c81 Compare November 30, 2024 07:12

sre-ci-robot added the kind/feature label Nov 30, 2024

mergify bot removed the do-not-merge/missing-related-issue label Nov 30, 2024

adarshs1310 force-pushed the sve-l2-fp32 branch from dde2c81 to 9276ee8 Compare December 3, 2024 05:04

adarshs1310 requested a review from alexanderguzhva December 3, 2024 13:07

adarshs1310 force-pushed the sve-l2-fp32 branch from 9276ee8 to a626265 Compare December 3, 2024 15:21

mergify bot removed the ci-passed label Dec 17, 2024

adarshs1310 force-pushed the sve-l2-fp32 branch from 704321d to e6abc41 Compare December 17, 2024 04:28

mergify bot added dco-passed and removed needs-dco labels Dec 17, 2024

alexanderguzhva reviewed Dec 17, 2024

View reviewed changes

adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from fba1adb to 09925c7 Compare December 17, 2024 15:39

mergify bot added the ci-passed label Dec 17, 2024

adarshs1310 requested review from alexanderguzhva and Presburger December 18, 2024 10:15

adarshs1310 force-pushed the sve-l2-fp32 branch from 09925c7 to 59f6ebf Compare January 6, 2025 07:13

mergify bot removed the ci-passed label Jan 6, 2025

adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from 554a685 to d6555e0 Compare January 6, 2025 07:20

adarshs1310 force-pushed the sve-l2-fp32 branch from d6555e0 to 35757c9 Compare January 13, 2025 04:36

Enable SVE Support for L2 Metric Computation in FP32

da98fcf

Signed-off-by:Adarsh Srivastava <[email protected]> Signed-off-by: Adarsh Srivastava <[email protected]>

adarshs1310 force-pushed the sve-l2-fp32 branch from 0328ad1 to da98fcf Compare January 20, 2025 09:35

mergify bot added the ci-passed label Jan 20, 2025

Presburger reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable SVE Support for L2 Metric Computation in FP32 #969

Enable SVE Support for L2 Metric Computation in FP32 #969

adarshs1310 commented Nov 29, 2024 •

edited

Loading

sre-ci-robot commented Nov 29, 2024

sre-ci-robot commented Nov 29, 2024

mergify bot commented Nov 29, 2024

alexanderguzhva Nov 29, 2024

adarshs1310 Nov 30, 2024

alexanderguzhva Nov 29, 2024

adarshs1310 Nov 30, 2024

adarshs1310 commented Nov 30, 2024

adarshs1310 commented Nov 30, 2024

codecov bot commented Dec 2, 2024 •

edited

Loading

alexanderguzhva commented Dec 3, 2024

adarshs1310 commented Dec 3, 2024

adarshs1310 commented Dec 9, 2024

foxspy commented Dec 9, 2024

alexanderguzhva commented Dec 9, 2024 •

edited

Loading

adarshs1310 commented Dec 17, 2024 •

edited

Loading

alexanderguzhva Dec 17, 2024

adarshs1310 Dec 17, 2024

alexanderguzhva commented Dec 18, 2024

adarshs1310 commented Dec 19, 2024

Presburger commented Dec 20, 2024

adarshs1310 commented Jan 6, 2025

adarshs1310 commented Jan 9, 2025

Presburger commented Jan 16, 2025

adarshs1310 commented Jan 20, 2025

adarshs1310 commented Jan 23, 2025

Presburger Jan 24, 2025

Presburger Jan 24, 2025

Presburger Jan 24, 2025

	target_compile_options(utils_sse PRIVATE -msse4.2 -mpopcnt)
	target_compile_options(utils_avx PRIVATE -mfma -mf16c -mavx2 -mpopcnt)
	target_compile_options(utils_avx512 PRIVATE -mfma -mf16c -mavx512f -mavx512dq
	-mavx512bw -mpopcnt -mavx512vl)

Enable SVE Support for L2 Metric Computation in FP32 #969

Are you sure you want to change the base?

Enable SVE Support for L2 Metric Computation in FP32 #969

Conversation

adarshs1310 commented Nov 29, 2024 • edited Loading

sre-ci-robot commented Nov 29, 2024

sre-ci-robot commented Nov 29, 2024

mergify bot commented Nov 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adarshs1310 commented Nov 30, 2024

adarshs1310 commented Nov 30, 2024

codecov bot commented Dec 2, 2024 • edited Loading

Codecov Report

alexanderguzhva commented Dec 3, 2024

adarshs1310 commented Dec 3, 2024

adarshs1310 commented Dec 9, 2024

foxspy commented Dec 9, 2024

alexanderguzhva commented Dec 9, 2024 • edited Loading

adarshs1310 commented Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexanderguzhva commented Dec 18, 2024

adarshs1310 commented Dec 19, 2024

Presburger commented Dec 20, 2024

adarshs1310 commented Jan 6, 2025

adarshs1310 commented Jan 9, 2025

Presburger commented Jan 16, 2025

adarshs1310 commented Jan 20, 2025

adarshs1310 commented Jan 23, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adarshs1310 commented Nov 29, 2024 •

edited

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading

alexanderguzhva commented Dec 9, 2024 •

edited

Loading

adarshs1310 commented Dec 17, 2024 •

edited

Loading