Why does the quantized value still exceed the range of FP8 representation? #15

adfad1 · 2024-03-30T16:27:54Z

Hi, thanks for providing such complete toolkit, but I have some questions about this toolkit.

I use this toolkit to evaluate resnet18 on cifar10 with FP8 in hybrid mode, I find that after this operation, outputs still exceed the range of FP8 representation.
outputs = fpemu_cuda.forward(input.contiguous(), mode, size, inplace, scale, blocknorm, blocksize)
For example, input.data[0,0] before this operation is

tensor([[-0.0089,  0.0410,  0.0068],
        [ 0.0663,  0.0292,  0.0986],
        [ 0.0737,  0.0730,  0.0111]], device='cuda:1', dtype=torch.float16)

after this operation, output[0,0] is

tensor([[-0.0090,  0.0405,  0.0068],
        [ 0.0676,  0.0293,  0.0991],
        [ 0.0721,  0.0721,  0.0113]], device='cuda:1', dtype=torch.float16)

mode is 'E4M3_RNE' . The problem is, the first output number -0.009 in binary is 1 01000 0010011100, this number obviously exceed the range of FP8. Why -0.0089 can be quantized to -0.0090 under 'E4M3_RNE' mode?

Thanks for reading and hope to hear back from you soon.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does the quantized value still exceed the range of FP8 representation? #15

Why does the quantized value still exceed the range of FP8 representation? #15

adfad1 commented Mar 30, 2024

Why does the quantized value still exceed the range of FP8 representation? #15

Why does the quantized value still exceed the range of FP8 representation? #15

Comments

adfad1 commented Mar 30, 2024