Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Nicoshev · 2025-02-18T19:26:31Z

Summary:
QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function

We have observed a ~x12 performance improvement for the downcasting case.
The case where a float32_t is returned maintains the same speed:

Full results:

before:

P1732996851

after:

P1732996401

Differential Revision: D69573878

facebook-github-bot · 2025-02-18T19:26:44Z

This pull request was exported from Phabricator. Differential Revision: D69573878

netlify · 2025-02-18T19:26:51Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`09852aa`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67b63a5fa10a8c0009927e2b
😎 Deploy Preview	https://deploy-preview-3707--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T19:33:08Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T21:39:21Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T21:41:52Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T23:42:52Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T23:45:16Z

This pull request was exported from Phabricator. Differential Revision: D69573878

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-18T23:48:42Z

This pull request was exported from Phabricator. Differential Revision: D69573878

Summary: X-link: facebookresearch/FBGEMM#788 We are adding this benchmark to measure performance of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf Differential Revision: D69670176

…alf (pytorch#3707) Summary: X-link: facebookresearch/FBGEMM#789 QuantUtilsNeon.cc has been introduced, Fused8BitRowwiseQuantizedSBFloatToFloatOrHalfNeon implemented as first function We have observed a ~x12 performance improvement for the downcasting case. The case where a float32_t is returned maintains the same speed: Full results: before: P1732996851 after: P1732996401 Differential Revision: D69573878

facebook-github-bot · 2025-02-19T20:09:21Z

This pull request was exported from Phabricator. Differential Revision: D69573878

facebook-github-bot added the cla signed label Feb 18, 2025

facebook-github-bot added the fb-exported label Feb 18, 2025

Nicoshev force-pushed the export-D69573878 branch from 99e2504 to 7d12c46 Compare February 18, 2025 19:32

Nicoshev force-pushed the export-D69573878 branch from 7d12c46 to 3bff7e9 Compare February 18, 2025 21:39

Nicoshev force-pushed the export-D69573878 branch from 3bff7e9 to a8931d3 Compare February 18, 2025 21:41

Nicoshev force-pushed the export-D69573878 branch from a8931d3 to 6fdcc64 Compare February 18, 2025 23:42

Nicoshev force-pushed the export-D69573878 branch from 6fdcc64 to 0556e72 Compare February 18, 2025 23:44

Nicoshev force-pushed the export-D69573878 branch from 0556e72 to 4757d4c Compare February 18, 2025 23:48

Nicoshev added 2 commits February 19, 2025 12:08

Add Quantize benchmark (pytorch#3706)

c9dc2e0

Summary: X-link: facebookresearch/FBGEMM#788 We are adding this benchmark to measure performance of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf Differential Revision: D69670176

Nicoshev force-pushed the export-D69573878 branch from 4757d4c to 09852aa Compare February 19, 2025 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Nicoshev commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

netlify bot commented Feb 18, 2025 •

edited

Loading

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 19, 2025

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Are you sure you want to change the base?

Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf #3707

Conversation

Nicoshev commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

netlify bot commented Feb 18, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

facebook-github-bot commented Feb 19, 2025

netlify bot commented Feb 18, 2025 •

edited

Loading