You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main datasets that PTv3 has been trained on are quite large in scale (Waymo or room-level datasets). Would you recommend any hyperparameters or best practices when training object level datasets (~ 1m wide max)?
Is there any grid_size you would recommend?
The text was updated successfully, but these errors were encountered:
Imagine these objects as models in a real-world 3D space. I think it's a good idea to freely rescale them into cubes ranging from 0.2m to 1.0m, and augment the scaled objects with random scaling to improve perception. You can then continue using a grid size of 0.02 for training.
However, keep in mind that 3D backbones are sensitive to the original grid size (i.e., the density) of the point cloud. For example, consider images: if you have a low-resolution image and upscale it to a high resolution before feeding it into a network, you can't expect good performance. Similarly, if the original object is very sparse, even if you apply a grid size of 0.02, the effective density might be lower than the grid size suggests, leading to poor performance.
To simplify:
If you have the original object, scale and sample it with the following approach. Determine the number of sampled points adaptively based on the surface area: (num_point = total area / (0.02 ** 2))
If you only have a sparse, object-level point cloud, consider scaling it down to a smaller cube to increase the density after scaling.
There are many details in bringing a good performance, currently, object-level datasets like modelnet40 don't sample point cloud adaptive based on the size of objects, which is a negative aspect, I will try to fix these issues later.
BTW: already welcome discussion related to better design and setting to specific tasks.
Hi @Gofinge thanks for releasing the code!
The main datasets that PTv3 has been trained on are quite large in scale (Waymo or room-level datasets). Would you recommend any hyperparameters or best practices when training object level datasets (~ 1m wide max)?
Is there any
grid_size
you would recommend?The text was updated successfully, but these errors were encountered: