Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About object‘s center #98

Open
ForeverRuri opened this issue Jul 29, 2021 · 1 comment
Open

About object‘s center #98

ForeverRuri opened this issue Jul 29, 2021 · 1 comment

Comments

@ForeverRuri
Copy link

Re (1): When placing the object mesh in the 3D scene we assume it's centered around t_z. Note that t_z is inconsequential for the method and is only relevant for rendering the object back to the image for visualization. The important information that might be useful is that given the true depth center of the object (i.e. 2m from the image plane) and the focal length of the camera, Mesh R-CNN will return the object in its original scale. In other words, knowing these two scalar parameters (t_z and f), one can infer the true size of the object in the world. The t_z parameter is used in the demo code here:

tc = pred_dz.abs().max() + 1.0

Re (2): I haven't played with the focal_length value too much. However, the prediction shouldn't change when using different focal lengths. The choice of focal_length only scales the prediction but doesn't change the shape of the prediction. This scaling happens here

meshes = transform_meshes_to_camera_coord_system(

Re (3): We assume perspective camera throughout the codebase for both ShapeNet and Pix3D as the ground truth is indeed provided using perspective cameras. There is a few changes that need to happen in the roi-based preparation stage in detectron2 and in the final inference transforms -see link in (2).

Originally posted by @gkioxari in #52 (comment)

@ForeverRuri
Copy link
Author

ForeverRuri commented Jul 29, 2021

@gkioxari
Here we say

tc = pred_dz.abs().max() + 1.0

is the center of object

Why not to use the mean of min value of vertex position and max vertex position in direction in Z as the t_z to supervise the training?

And could u explain the meaning of this operation? I can not understand the abs and plus 1 operation, is it just a random sampling operation that guarantee the render pos is on the right of the mesh in the z-axis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant