What is the expected inference steps after I apply torchao in training? #638

goldhuang · 2024-10-21T22:19:06Z

Hello, I have integrated torchao to my training. But I think it's not very clear what the inference should be like.
Should I use the converted FP8 linear layer to do inference? Is delayed scaling supposed to work in inference?
Or, should I use the original linear layer to do inference?

Thanks in advance if you can help to clarify!

tianyu-l · 2024-10-21T22:28:10Z

cc: @weifengpy @vkuzo

kwen2501 · 2024-10-21T23:37:50Z

Do you need Distributed Inference? Or are you doing inference on single GPU?

For single GPU, I think using original model definition + loading quantized weights should ideally "just work", @vkuzo to confirm. If not, please file a RFC in ao.
For Distributed Inference, we are building DTensor + Quantized Tensor support in torchchat. (We are yet to publish a demo.) There is also a simple ao + TP example in the ao repo: link.

goldhuang · 2024-10-22T03:33:38Z

Do you need Distributed Inference? Or are you doing inference on single GPU?

For single GPU, I think using original model definition + loading quantized weights should ideally "just work", @vkuzo to confirm. If not, please file a RFC in ao.

For Distributed Inference, we are building DTensor + Quantized Tensor support in torchchat. (We are yet to publish a demo.) There is also a simple ao + TP example in the ao repo: link.

If original model definition is working for single GPU, that means I could just use my current distributed inference, is that right? The ao + TP example looks like to use torchao converted FP8 linear layer to do inference, which is different from what you suggest for single GPU. To be honest, I feel a little bit confused.

tianyu-l added the question Further information is requested label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the expected inference steps after I apply torchao in training? #638

What is the expected inference steps after I apply torchao in training? #638

goldhuang commented Oct 21, 2024 •

edited

Loading

tianyu-l commented Oct 21, 2024

kwen2501 commented Oct 21, 2024 •

edited

Loading

goldhuang commented Oct 22, 2024

What is the expected inference steps after I apply torchao in training? #638

What is the expected inference steps after I apply torchao in training? #638

Comments

goldhuang commented Oct 21, 2024 • edited Loading

tianyu-l commented Oct 21, 2024

kwen2501 commented Oct 21, 2024 • edited Loading

goldhuang commented Oct 22, 2024

goldhuang commented Oct 21, 2024 •

edited

Loading

kwen2501 commented Oct 21, 2024 •

edited

Loading