[bc-breaking] enable direct configuration in quantize_ #1595

vkuzo · 2025-01-22T16:49:12Z

summary

This PR enables passing per-workflow arguments to quantize_ directly, without wrapping them in a Callable.

Motivation: passing direct configuraton is intuintive and widely used in similar contexts across various projects. Passing configuration wrapped in a callable is IMO not intuitive, hard to understand and debug, and we have evidence that it pushes a portion of users from building on top of torchao.

We will keep the old callable syntax supported by quantize_ for one release cycle, and delete it afterwards. We will keep the old names as aliases for new names going forward (example: int4_weight_only as an alias of Int4WeightOnlyConfig) to keep existing callsites working without changes.

user facing API changes

signature of quantize_

#
# before
#
def quantize(
    model: torch.nn.Module,
    apply_tensor_subclass: Callable[[torch.nn.Module], torch.nn.Module],
    ...,
): ...

#
# after - intermediate state, support both old and new for one release
#
def quantize(
    model: torch.nn.Module,
    config: Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]],
    ...,
): ...

#
# after - long term state
#
def quantize(
    model: torch.nn.Module,
    config: AOBaseConfig,
    ...,
): ...

usage example

An example for int4_weight_only

#
# before
#
quantize_(m, int4_weight_only(group_size=32))

#
# after, with new user facing names
#
quantize_(m, Int4WeightOnlyConfig(group_size=32))

#
# AND, after, with BC names
#
quantize_(m, int4_weight_only(group_size=32))

developer facing changes

See the PR details for examples, but they can be summarized as:

#
# old
#

# quantize_ calls the instance of calling this function on each module of the model
def int4_weight_only(group_size: int, ...) -> Callable:

    def new_callable(weight: torch.Tensor):
        # configuration is captured here via local variables
        ...
        
    # return type is a Callable
    return _get_linear_subclass_inserter(new_callable)

#
# new
#

# config base class
class AOBaseConfig(abc.ABC):
    pass

# user facing configuration of a workflow
@dataclass
class Int4WeightOnlyConfig(AOBaseConfig):
    group_size: int = 128
    ...

# not user facing transform of a module according to a worfklow's configuration
@register_quantize_module_handler(Int4WeightOnlyConfig)
def _int4_weight_only_transform(
    module: torch.nn.Module, 
    config: Int4WeightOnlyConfig,
) -> torch.nn.Module:
    # map to AQT, not user facing
    ...

current status

The current PR migrates three user facing workflows:

PTQ's int4_weight_only
QAT's intx_quantization_aware_training and from_intx_quantization_aware_training

I've chosen to migrate one PTQ and two QAT workflows to prove generality of the new flow, but avoid a high LOC in this PR to make it easier to review. We will migrate the rest of the workflows in future PRs, detailed below:

int8_dynamic_activation_int4_weight
int8_dynamic_activation_int8_weight
int8_dynamic_activation_int8_semi_sparse_weight
int8_weight_only
float8_weight_only
float8_dynamic_activation_float8_weight
float8_static_activation_float8_weight
uintx_weight_only
fpx_weight_only
gemlite_uintx_weight_only
callsites from the prototype folder

After a release cycle, we will delete the old callable syntax.

Test Plan:

pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics
pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone
pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-01-22T16:49:13Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-01-22T16:49:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fac3263 with merge base 999b16d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb0703f88413bc06962dacde24ff6bb7cf0f3b19 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 73e9a5c3bf03e2cb645cc0ea43bec162a5f4897e ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ff2d58b120453a36d10c24da3df207b9348bdc7a ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 05b6a547051288c8e59bad7d1df3bca402ea3991 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e4f1550e3130d523e244a2dfdebb7d4db824c388 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c0716eda5694ddd9a649fc2cdbb292121a1f4da4 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

torchao/core/config.py

torchao/quantization/_transform_module.py

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 56720183d4530d718a44257ec61110f7a3ffee9f ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2cb59edde02826639292373da3653a045b06ce7f ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fc9a5c15c6269b83fe4e5b9025728b1e53627490 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

andrewor14

Looks great! Mostly just minor doc nits.

andrewor14 · 2025-02-05T21:30:53Z

test/dtypes/test_affine_quantized.py

@@ -180,8 +187,13 @@ def apply_uint6_weight_only_quant(linear):
    )
    @unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
    def test_print_quantized_module(self, apply_quant):
+        print(apply_quant)


test/dtypes/test_affine_quantized.py

andrewor14 · 2025-02-05T21:33:16Z

torchao/core/config.py

@@ -0,0 +1,10 @@
+import abc


I feel we can just add this to torchao/config.py without making a new core directory. No strong preference though

slightly stronger preference is I feel "core" shouldn't appear in the import, so users should be able to do this:

from torchao.config import AOBaseConfig

but we can do that by adding this to __init__.py

andrewor14 · 2025-02-05T21:35:23Z

test/quantization/test_qat.py

@@ -1185,7 +1185,7 @@ def test_qat_prototype_bc(self):
    @unittest.skipIf(
        not TORCH_VERSION_AT_LEAST_2_4, "skipping when torch version is 2.4 or lower"
    )
-    def test_quantize_api(self):
+    def test_quantize_api_standalone(self):


do we need this change?

it's convenient from being able to filter for only this test from the commandline. I can remove it if you'd like.

torchao/quantization/qat/api.py

torchao/core/config.py

andrewor14 · 2025-02-05T21:41:22Z

torchao/quantization/quant_api.py

+            handler,
+            _is_linear if filter_fn is None else filter_fn,
+            device=device,
+            extra_args=(config,),


alternatively we can pass in a lambda, then we don't need to add extra_args or pass in config:

replace_fn = lambda mod: handler(mod, config)

seems simpler

I'm really not a fan of passing callables around, it's easy when the callable is simple but easy for future people to tack ugly stuff on and increase complexity. Non-callable args make it harder to make the code ugly in the future.

oh sorry, I meant pass in replace_fn instead of handler, like:

replace_fn = lambda mod: handler(mod, config) _replace_with_custom_fn_if_matches_filter( model, replace_fn, _is_linear if filter_fn is None else filter_fn, device=device, )

either way you're passing a callable

hmm, still not a fan of replace_fn = lambda mod: handler(mod, config). This changes replace_fn from a stateless callable to a stateful callable, where the state is hard to inspect. It's less LOC but harder to debug IMO.

torchao/quantization/transform_module.py

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5f5330c5b9c1bdb5df12f3efebd559a42927984c ghstack-comment-id: 2607756510 Pull Request resolved: #1595

msaroufim · 2025-02-10T21:37:58Z

test/dtypes/test_affine_quantized.py

+                quantize_(linear, apply_quant)
+            else:
+                # TODO(#1690): delete this once config migration is done
+                ql = apply_quant(linear)


have a few partners where we need to forward fix BC issues including HuggingFace transformers, Optimimum, SGLang and Diffusers

@msaroufim do you have a link?

I don't expect any BC breakages of people using the quantize_ API as specified in the docs. The BC breaking change would be if people are applying their transform on linear layers directly, without using quantize_.

HF callsite: https://github.com/huggingface/transformers/blob/1feebb5b4150882deabddd190a541f336f3be817/src/transformers/quantizers/quantizer_torchao.py#L199

SGLANG callsite: https://github.com/sgl-project/sglang/blob/2f47d710ae9cb1bdbbe0fe2392a0634827d257b3/python/sglang/srt/layers/torchao_utils.py#L39

Diffusers callsite: https://github.com/huggingface/diffusers/blob/7fb481f840b5d73982cafd1affe89f21a5c0b20b/src/diffusers/quantizers/torchao/torchao_quantizer.py#L234

we should definitely test these, but they look like they will be unaffected to me

[ghstack-poisoned]

Update

24114ce

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025

vkuzo changed the title ~~[wip] configs configs configs!~~ [rfc] enable direct configuration in quantize_, v2 Jan 22, 2025

vkuzo added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label Jan 22, 2025

vkuzo mentioned this pull request Jan 22, 2025

[rfc] enable direct configuration in quantize_ #1585

Closed

Update

5b9d876

[ghstack-poisoned]

Update

1cea42f

[ghstack-poisoned]

Update

138883b

[ghstack-poisoned]

Update

ba045ea

[ghstack-poisoned]

Update

94d9426

[ghstack-poisoned]

vkuzo requested review from andrewor14, jerryzh168, drisspg and HDCharles January 23, 2025 16:15

vkuzo changed the title ~~[rfc] enable direct configuration in quantize_, v2~~ [bc-breaking] enable direct configuration in quantize_, v2 Jan 23, 2025

vkuzo changed the title ~~[bc-breaking] enable direct configuration in quantize_, v2~~ [bc-breaking] enable direct configuration in quantize_ Jan 23, 2025

drisspg reviewed Jan 23, 2025

View reviewed changes

torchao/core/config.py Outdated Show resolved Hide resolved

drisspg reviewed Jan 23, 2025

View reviewed changes

torchao/quantization/_transform_module.py Outdated Show resolved Hide resolved

drisspg reviewed Jan 23, 2025

View reviewed changes

torchao/quantization/_transform_module.py Outdated Show resolved Hide resolved

Update

b589ce7

[ghstack-poisoned]

drisspg approved these changes Jan 23, 2025

View reviewed changes

vkuzo mentioned this pull request Jan 29, 2025

make smoothquant more PT2 friendly #1639

Open

Update

aaba2d8

[ghstack-poisoned]

Update

26850da

[ghstack-poisoned]

andrewor14 approved these changes Feb 5, 2025

View reviewed changes

Update

7caecb1

[ghstack-poisoned]

msaroufim reviewed Feb 10, 2025

View reviewed changes

This was referenced Feb 10, 2025

placeholder for migrating workflow configuration to AOBaseConfig #1690

Open

config migration: float8* #1694

Open

Update

0542402

[ghstack-poisoned]

This was referenced Feb 11, 2025

config migration: int* #1696

Open

config migration: fpx, gemlite, uintx #1697

Open

Update

fac3263

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bc-breaking] enable direct configuration in quantize_ #1595

[bc-breaking] enable direct configuration in quantize_ #1595

vkuzo commented Jan 22, 2025 •

edited

Loading

vkuzo commented Jan 22, 2025 •

edited

Loading

pytorch-bot bot commented Jan 22, 2025 •

edited

Loading

andrewor14 left a comment

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

vkuzo Feb 10, 2025

andrewor14 Feb 5, 2025

vkuzo Feb 5, 2025

andrewor14 Feb 5, 2025

vkuzo Feb 10, 2025

msaroufim Feb 10, 2025

vkuzo Feb 10, 2025

vkuzo Feb 10, 2025

[bc-breaking] enable direct configuration in quantize_ #1595

Are you sure you want to change the base?

[bc-breaking] enable direct configuration in quantize_ #1595

Conversation

vkuzo commented Jan 22, 2025 • edited Loading

summary

user facing API changes

signature of quantize_

usage example

developer facing changes

current status

vkuzo commented Jan 22, 2025 • edited Loading

pytorch-bot bot commented Jan 22, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595

✅ No Failures

andrewor14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkuzo commented Jan 22, 2025 •

edited

Loading

vkuzo commented Jan 22, 2025 •

edited

Loading

pytorch-bot bot commented Jan 22, 2025 •

edited

Loading