-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D69573878 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
99e2504
to
7d12c46
Compare
This pull request was exported from Phabricator. Differential Revision: D69573878 |
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
7d12c46
to
3bff7e9
Compare
This pull request was exported from Phabricator. Differential Revision: D69573878 |
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
3bff7e9
to
a8931d3
Compare
This pull request was exported from Phabricator. Differential Revision: D69573878 |
a8931d3
to
6fdcc64
Compare
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
This pull request was exported from Phabricator. Differential Revision: D69573878 |
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
6fdcc64
to
0556e72
Compare
This pull request was exported from Phabricator. Differential Revision: D69573878 |
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
0556e72
to
4757d4c
Compare
This pull request was exported from Phabricator. Differential Revision: D69573878 |
Summary: X-link: facebookresearch/FBGEMM#788 We are adding this benchmark to measure performance of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf Differential Revision: D69670176
…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878
4757d4c
to
09852aa
Compare
This pull request was exported from Phabricator. Differential Revision: D69573878 |
Summary:
QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function
We have observed a ~x12 performance improvement for the downcasting case.
The case where a float32_t is returned maintains the same speed:
Full results:
before:
P1732996851
after:
P1732996401
Differential Revision: D69573878