Vectorization #2

slice4e · 2022-06-24T19:17:30Z

Since we are comparing the value to a "window" of previous values during compression, I believe we may benefit from vectorizing the code - compare the value to multiple values concurrently using vector instructions.

It is possible that the compiler can auto-vectorize the window comparison loop (best option), if the correct flags are used.
Alternatively, assembly intrinsics can be used to vectorize.

jermp · 2022-06-25T06:59:57Z

Sure, thank you for the suggestion!
Are you willing to submit a PR with the vectorized code?

slice4e · 2022-06-28T16:04:17Z

Hi - I will look into it...

andybbruno · 2022-07-08T16:10:39Z

Hey @slice4e 👋
thanks for creating this issue.

As you could see from the Makefile we enabled the -O3 flag but this clearly does not guarantee that the vectorization has been applied to every loop. I tried at least to identify what prevent the compiler to vectorize the comparison function and the message I got is this one:

test/../core/../lib/Window.cpp:53:9: remark: loop not vectorized: cannot identify array bounds [-Rpass-analysis=loop-vectorize]
        for (int i = 0; i < WINDOW_SIZE; i++)
        ^

AFAIR it's really hard to vectorize that part of the code, but of course, I can be wrong 😊

Please feel free to open a PR in case you find a proper way to vectorize the code

lrg11 · 2023-04-13T03:18:00Z

@slice4e
It seems that the _mm512_load_epi64 func would cause Segmentation fault, on my cascade lake machine
My complier CXXFLAGS is g++ -std=c++17 -march=cascadelake -mprefer-vector-width=512

slice4e · 2023-05-02T19:37:55Z

@lrg11 - I believe that this will typically occur is the data you are accessing with a vecrorized load is unaligned.
Could you try forcing a 64-byte aligned allocation of this buffer?

slice4e · 2023-05-02T19:50:08Z

@lrg11 - after some thought, I belive it will be prudent to first understand what fraction of the execution time we are spending in the "vectorizable" code. This limits the potential upside (based on Amdal's Law) but also due to frequency implications. Depending on the architecture, executing AVX512 instrucitons will cause the CPU to lower frequency, which may negatively impact the non-vectorized part of the code. For example, if we are optimizing a function which takes only 5% of the total execution time , even if we optimize it by 8X, there is no benefit if we lowered the frequency of the overall execution by 5%.

andybbruno · 2023-05-03T06:19:55Z

@slice4e @lrg11 thanks both for your interest 🙏

My 2 cents: if you take a look at our paper (see here) we found out that on average 38% of the time we end up in the XOR case (the one @slice4e tried to speed up) which is the slowest step in this algorithm. So, if there's a faster method to find the "closest" XOR value we can significantly improve the compression speed!

lrg11 · 2023-05-04T00:13:33Z

@lrg11 - after some thought, I belive it will be prudent to first understand what fraction of the execution time we are spending in the "vectorizable" code. This limits the potential upside (based on Amdal's Law) but also due to frequency implications. Depending on the architecture, executing AVX512 instrucitons will cause the CPU to lower frequency, which may negatively impact the non-vectorized part of the code. For example, if we are optimizing a function which takes only 5% of the total execution time , even if we optimize it by 8X, there is no benefit if we lowered the frequency of the overall execution by 5%.

Right, I had verified this point, vectorizing sometimes seems to be performance-loss

lrg11 · 2023-05-17T07:29:29Z

@slice4e @lrg11 thanks both for your interest 🙏

My 2 cents: if you take a look at our paper (see here) we found out that on average 38% of the time we end up in the XOR case (the one @slice4e tried to speed up) which is the slowest step in this algorithm. So, if there's a faster method to find the "closest" XOR value we can significantly improve the compression speed!

It could be done by using a indices which store the index of value with same trailing bits. This is a algorithm called chimp with the same insight, Ref

andybbruno · 2023-05-18T06:57:56Z

Thanks @lrg11 for the hint!
Please feel free to improve our solution with chimp approach (just open a PR), it'll be highly appreciated ☺️

lrg11 · 2023-05-18T07:07:21Z

OK, anyway, Could you provide me with the dataset in the TSXor paper, which can accelerate my developing and evaluating  1256765829 ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "Yang ***@***.***>; 发送时间: 2023年5月18日(星期四) 下午2:58 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [andybbruno/TSXor] Vectorization (Issue #2) Thanks @lrg11 for the hint! Please feel free to improve our solution with chimp approach (just open a PR), it'll be highly appreciated ☺️ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

andybbruno · 2023-05-18T07:13:17Z

I don't know if I still have the original files. Anyways, I'd rather use the datasets of this new paper so that we can validate our performances against theirs.

lrg11 · 2023-05-18T07:16:51Z

OK, I'm evaluating the new paper dataset! Thanks   1256765829 ***@***.***  

…

------------------ 原始邮件 ------------------ 发件人: "Yang ***@***.***>; 发送时间: 2023年5月18日(星期四) 下午3:13 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [andybbruno/TSXor] Vectorization (Issue #2) I don't know if I still have the original files. Anyways, I'd rather use the datasets of this new paper so that we can validate our performances against theirs. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorization #2

Vectorization #2

slice4e commented Jun 24, 2022

jermp commented Jun 25, 2022

slice4e commented Jun 28, 2022

andybbruno commented Jul 8, 2022

lrg11 commented Apr 13, 2023

slice4e commented May 2, 2023

slice4e commented May 2, 2023

andybbruno commented May 3, 2023

lrg11 commented May 4, 2023

lrg11 commented May 17, 2023 •

edited

Loading

andybbruno commented May 18, 2023

lrg11 commented May 18, 2023 via email

andybbruno commented May 18, 2023

lrg11 commented May 18, 2023 via email

Vectorization #2

Vectorization #2

Comments

slice4e commented Jun 24, 2022

jermp commented Jun 25, 2022

slice4e commented Jun 28, 2022

andybbruno commented Jul 8, 2022

lrg11 commented Apr 13, 2023

slice4e commented May 2, 2023

slice4e commented May 2, 2023

andybbruno commented May 3, 2023

lrg11 commented May 4, 2023

lrg11 commented May 17, 2023 • edited Loading

andybbruno commented May 18, 2023

lrg11 commented May 18, 2023 via email

andybbruno commented May 18, 2023

lrg11 commented May 18, 2023 via email

lrg11 commented May 17, 2023 •

edited

Loading