-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xnn s32 or #6823
Xnn s32 or #6823
Conversation
xnn_storeu_s32(output + 1 * xnn_simd_size_s32, vy_1); | ||
output += 32; | ||
} | ||
for (; batch >= xnn_simd_bytes_s32; batch -= xnn_simd_bytes_s32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you use batch -= 32 * sizeof(int32_t)) { for the main loop. which is the correct amount for avx512
should this one be batch -= 16 * sizeof(int32_t)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fbarchard vin1_0, vin1_1 two set of inputs are processed here, like loop unrolling of 2 so its 32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vin1 is from input_a and is 16 ints
vin2 is from input_b and is 16 ints
should the loop be doing:
for (; batch >= 16 * sizeof(int32_t); batch -= 16 * sizeof(int32_t)) {
output += xnn_simd_size_s32; | ||
} | ||
if XNN_UNLIKELY(batch != 0) { | ||
xnn_simd_s32_t vin1 = xnn_load_tail_s32(input_a, batch >> XNN_LOG2_SIZEOF_INT32_T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally for native we'd create a mask based on the remainder and use it for all the instructions.
the loop here would be the same as the previous loop, but with a mask
its also prudent to put an assert on the expected batch size to ensure we dont accidently have too much or too little for the remainder masking to work
OR op is part of #6836. Hence closing it |
Bitwise OR implementation