AssertionError: input should be in float32 type, got torch.float16 #9

lawrence-ff · 2024-03-05T06:27:54Z

Hi, this problem is caused by the assertions in the norm.py file, it shows that some parts require the input data type to be torch.float32 but actually the input data type is torch.float16, what is causing this problem, is the interpreter not working,？
looking forward to your reply, thanks!

georghess · 2024-03-05T15:27:37Z

Hi,

I think there is some bug in the mixed precision training. A work-around is to change the forward method in the VFELayer from

    @auto_fp16(apply_to=('inputs'), out_fp32=True)
    def forward(self, inputs):
        """Forward function.

        Args:
            inputs (torch.Tensor): Voxels features of shape (M, C).
                M is the number of points, C is the number of channels of point features.

        Returns:
            torch.Tensor: point features in shape (M, C).
        """
        # [K, T, 7] tensordot [7, units] = [K, T, units]
        x = self.linear(inputs)
        x = self.norm(x)
        pointwise = F.relu(x)
        return pointwise

to

    @auto_fp16(apply_to=('inputs'), out_fp32=False)#
    def forward(self, inputs):
        """Forward function.

        Args:
            inputs (torch.Tensor): Voxels features of shape (M, C).
                M is the number of points, C is the number of channels of point features.

        Returns:
            torch.Tensor: point features in shape (M, C).
        """
        # [K, T, 7] tensordot [7, units] = [K, T, units]
        x = self.linear(inputs)
        x = self.norm(x.float())
        pointwise = F.relu(x)
        return pointwise

gorkemguzeler · 2024-09-09T12:26:53Z

Hi @georghess , I reproduced the same issue, and the proposed solution did not resolve the problem. I would appreciate any other ideas?

georghess · 2024-09-09T13:49:02Z

Hi @gorkemguzeler, pretty sure that the solution above should work. That's what we've done on our dev-branch at least. There we've switched to our own fork of SST which has the above changes.

To help you more than this, I'd need some more info. Could you send the entire error trace?

gorkemguzeler · 2024-09-10T11:47:27Z

Hi @georghess, thanks a lot for your quick reply and information!

I switched my branch to the dev and used your own fork of SST. I did not run into the above issue this time.

One thing I noticed is:
When I started the training on the main branch after some updates in norm.py, scatter_points.py files to avoid the issue above, training the model took around 1.5 hour per epoch. Additionally, i was getting the following warning:

No voxel belongs to drop_level3 in shift 0
No voxel belongs to drop_level4 in shift 0

I just tried with the dev branch and forked SST, there is no such warnings during training but it takes 11 hours per epoch.

Are the warnings above expected during the training on main branch? Should I resolve them for stable training?
What could be the reason of huge training time duration difference between these trainings (e.g. a default hyperparameter impact, dataset specific method) given that I use the same GPU and dataset (nuscenes trainval split)? I also checked out the lidar encoders and both branches use the sst.
Which branch would you recommend me to use given that I will try to implement some downstream tasks on top of your repository?

daiduck · 2024-11-14T12:10:00Z

I also meet this error when I use mmdet3d , the situation I encountered is, the Decorator @auto_fp16 controls precision conversion based on the parameter fp16_enabled. in some cases, this parameter is set to False , so the output is not correctly converted to fp32 . thats what i meet .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: input should be in float32 type, got torch.float16 #9

AssertionError: input should be in float32 type, got torch.float16 #9

lawrence-ff commented Mar 5, 2024

georghess commented Mar 5, 2024

gorkemguzeler commented Sep 9, 2024

georghess commented Sep 9, 2024

gorkemguzeler commented Sep 10, 2024 •

edited

Loading

daiduck commented Nov 14, 2024

AssertionError: input should be in float32 type, got torch.float16 #9

AssertionError: input should be in float32 type, got torch.float16 #9

Comments

lawrence-ff commented Mar 5, 2024

georghess commented Mar 5, 2024

gorkemguzeler commented Sep 9, 2024

georghess commented Sep 9, 2024

gorkemguzeler commented Sep 10, 2024 • edited Loading

daiduck commented Nov 14, 2024

gorkemguzeler commented Sep 10, 2024 •

edited

Loading