You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Re (1): When placing the object mesh in the 3D scene we assume it's centered around t_z. Note that t_z is inconsequential for the method and is only relevant for rendering the object back to the image for visualization. The important information that might be useful is that given the true depth center of the object (i.e. 2m from the image plane) and the focal length of the camera, Mesh R-CNN will return the object in its original scale. In other words, knowing these two scalar parameters (t_z and f), one can infer the true size of the object in the world. The t_z parameter is used in the demo code here:
Re (2): I haven't played with the focal_length value too much. However, the prediction shouldn't change when using different focal lengths. The choice of focal_length only scales the prediction but doesn't change the shape of the prediction. This scaling happens here
Re (3): We assume perspective camera throughout the codebase for both ShapeNet and Pix3D as the ground truth is indeed provided using perspective cameras. There is a few changes that need to happen in the roi-based preparation stage in detectron2 and in the final inference transforms -see link in (2).
Why not to use the mean of min value of vertex position and max vertex position in direction in Z as the t_z to supervise the training?
And could u explain the meaning of this operation? I can not understand the abs and plus 1 operation, is it just a random sampling operation that guarantee the render pos is on the right of the mesh in the z-axis?
Re (1): When placing the object mesh in the 3D scene we assume it's centered around
t_z
. Note thatt_z
is inconsequential for the method and is only relevant for rendering the object back to the image for visualization. The important information that might be useful is that given the true depth center of the object (i.e. 2m from the image plane) and the focal length of the camera, Mesh R-CNN will return the object in its original scale. In other words, knowing these two scalar parameters (t_z and f), one can infer the true size of the object in the world. Thet_z
parameter is used in the demo code here:meshrcnn/demo/demo.py
Line 80 in ab0762f
Re (2): I haven't played with the focal_length value too much. However, the prediction shouldn't change when using different focal lengths. The choice of focal_length only scales the prediction but doesn't change the shape of the prediction. This scaling happens here
meshrcnn/demo/demo.py
Line 95 in ab0762f
Re (3): We assume perspective camera throughout the codebase for both ShapeNet and Pix3D as the ground truth is indeed provided using perspective cameras. There is a few changes that need to happen in the roi-based preparation stage in detectron2 and in the final inference transforms -see link in (2).
Originally posted by @gkioxari in #52 (comment)
The text was updated successfully, but these errors were encountered: