Skip to content
This repository has been archived by the owner on Dec 3, 2024. It is now read-only.

Why does the quantized value still exceed the range of FP8 representation? #15

Open
adfad1 opened this issue Mar 30, 2024 · 0 comments
Open

Comments

@adfad1
Copy link

adfad1 commented Mar 30, 2024

Hi, thanks for providing such complete toolkit, but I have some questions about this toolkit.

I use this toolkit to evaluate resnet18 on cifar10 with FP8 in hybrid mode, I find that after this operation, outputs still exceed the range of FP8 representation.
outputs = fpemu_cuda.forward(input.contiguous(), mode, size, inplace, scale, blocknorm, blocksize)
For example, input.data[0,0] before this operation is

tensor([[-0.0089,  0.0410,  0.0068],
        [ 0.0663,  0.0292,  0.0986],
        [ 0.0737,  0.0730,  0.0111]], device='cuda:1', dtype=torch.float16)

after this operation, output[0,0] is

tensor([[-0.0090,  0.0405,  0.0068],
        [ 0.0676,  0.0293,  0.0991],
        [ 0.0721,  0.0721,  0.0113]], device='cuda:1', dtype=torch.float16)

mode is 'E4M3_RNE' . The problem is, the first output number -0.009 in binary is 1 01000 0010011100, this number obviously exceed the range of FP8. Why -0.0089 can be quantized to -0.0090 under 'E4M3_RNE' mode?

Thanks for reading and hope to hear back from you soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant