Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xnn s32 or #6823

Closed
wants to merge 6 commits into from
Closed

Xnn s32 or #6823

wants to merge 6 commits into from

Conversation

umadevimcw
Copy link
Contributor

Bitwise OR implementation

xnn_storeu_s32(output + 1 * xnn_simd_size_s32, vy_1);
output += 32;
}
for (; batch >= xnn_simd_bytes_s32; batch -= xnn_simd_bytes_s32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you use batch -= 32 * sizeof(int32_t)) { for the main loop. which is the correct amount for avx512
should this one be batch -= 16 * sizeof(int32_t)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fbarchard vin1_0, vin1_1 two set of inputs are processed here, like loop unrolling of 2 so its 32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vin1 is from input_a and is 16 ints
vin2 is from input_b and is 16 ints
should the loop be doing:
for (; batch >= 16 * sizeof(int32_t); batch -= 16 * sizeof(int32_t)) {

output += xnn_simd_size_s32;
}
if XNN_UNLIKELY(batch != 0) {
xnn_simd_s32_t vin1 = xnn_load_tail_s32(input_a, batch >> XNN_LOG2_SIZEOF_INT32_T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally for native we'd create a mask based on the remainder and use it for all the instructions.
the loop here would be the same as the previous loop, but with a mask
its also prudent to put an assert on the expected batch size to ensure we dont accidently have too much or too little for the remainder masking to work

@umadevimcw
Copy link
Contributor Author

OR op is part of #6836. Hence closing it

@umadevimcw umadevimcw closed this Aug 12, 2024
@umadevimcw umadevimcw deleted the xnn_s32_or branch August 12, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants