You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
From Nvidia quantization toolkit, the first step is to calibration to get the quantization scale, and the second step is to finetune based on the scale.
And from their experience, they said.
Do not change quantization representation (scale) during training, at least not too frequently. Changing scale every step, it is effectively like changing data format (e8m7, e5m10, e3m4, et.al) every step, which will easily affect convergence.
But from the examples of QAT, you change the scale every train iteration.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
From Nvidia quantization toolkit, the first step is to calibration to get the quantization scale, and the second step is to finetune based on the scale.
And from their experience, they said.
But from the examples of QAT, you change the scale every train iteration.
is this a better way in your experience?
Beta Was this translation helpful? Give feedback.
All reactions