-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Multithread packing #7545
base: master
Are you sure you want to change the base?
Conversation
@alankelly @dsharlet @fbarchard Putting up an RFC for multi threading packing at subgraph create. Please take a look at the commits labeld [MultiThreaded Packing][RFC]. |
see around ~4.5x perf wins on first time model load. |
@@ -0,0 +1,9 @@ | |||
|
|||
// Copyright 2024 Google LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅
size_t extra_bytes_bl, | ||
size_t extra_bytes_n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need these from the API point of view?
src/packw.c
Outdated
/*extra_bytes=*/context->extra_bytes_bl, | ||
/*extra_bytes_n=*/context->extra_bytes_n, | ||
/*params=*/context->params); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm empty line
const uint8_t* w0 = (const uint8_t*) weights; | ||
const uint16_t* s0 = (const uint16_t*) scale; | ||
size_t n = nc; | ||
for (;n >= ${NR}; n -= ${NR}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent 4
|
||
// KC/2 bytes is KC Nibbles | ||
$for N in range(1, NR): | ||
const uint8_t* w${N} = w${N-1} + (kc >> 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kernel does not support odd KC?
how do we ensure convolutions dont call this when kc is odd?
for x4-pack scalar kernel it still supports any KC, inefficiently
then from avxvnni I check if KC is odd and call the scalar kernel to handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kernel isn't used for convolutions, and only for fully connected layers.
A constraint applied on blockwise quantization is that kc must be divisible by blocksize, and block size is a multiple 32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code pointer? Let's make sure they are not just asserts and we fail with proper error message if not doing that already.
bbb8982
to
eb8384b
Compare
eb8384b
to
30bd2ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer the scalar kernel be in its own PR. But the kernel part of this looks ok
Enable Multithreaded packing routines in xnn_create. This allows us to multi thread our packing routines at initialization, which can help with more performant first time model loads.