Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to find the camera intrinsics? #4

Open
Nimolty opened this issue Mar 25, 2024 · 16 comments
Open

Where to find the camera intrinsics? #4

Nimolty opened this issue Mar 25, 2024 · 16 comments

Comments

@Nimolty
Copy link

Nimolty commented Mar 25, 2024

Hello! Thanks for opening-source this great works!
I am trying to analyze the raw data (27TB). However, I was wondering where the camera intrinsics are. Could you please provide some information.

@Nimolty
Copy link
Author

Nimolty commented Mar 25, 2024

Futhermore, I would like to ask where to find the raw depth image in the raw data (27TB). Thank you very much!

@Nimolty
Copy link
Author

Nimolty commented Mar 25, 2024

Thanks, I have found the code in the droid/scripts/convert/svo_to_mp4.py where I can trasform the original .svo file into rgb image, depth and pointcloud. However, i find that there are a lot of "nan" in depth and pointcloud. Could you please give me some advice about this?

@ashwin-balakrishna96
Copy link

ashwin-balakrishna96 commented Mar 25, 2024

Hi Nimolty:

Thanks for the questions, I created a draft PR here that should help answer most of your questions and show how you can work with the provided depth information in DROID. To summarize:

  1. The camera intrinsics can be accessed here.
  2. We do not store raw depth data in DROID, but you can use the ZED stereo depth model to get depth estimates if you want (this is the example shown in the PR above). In my experience, these depth estimates are not very good, but we have found that we can obtain high quality depth estimates using recent stereo depth models given the camera intrinsics and baseline. I would recommend leveraging these to obtain better depth estimates.
  3. The NAN values in the ZED depth estimates indicate that the depth could not be estimated correctly due to occlusion or if it is an outlier (see description here). You can see how I deal with these when visualizing things in the PR, but for better results I would recommend using a more sophisticated stereo depth model. Internally at TRI, we have found very good results by using the stereo model proposed in this paper, but unfortunately we are not quite ready to open-source it yet.

@Jay-Ye
Copy link

Jay-Ye commented Mar 25, 2024

Thanks for the answer!
Would you please consider including the camera intrinsic to the metadata*.json of each episode? It would be much more convenient to get direct access to both the extrinsic and intrinsic without necessarily downloading the whole 27TB data.

@Nimolty
Copy link
Author

Nimolty commented Mar 26, 2024

Thank you for your detailed advice!
For the second point, it is said high quality depth estimation can be achieved by recent stereo depth models. Would you please recommend some options? Are these models contained in the ZED python api or we can only find some offline data generator?

@ashwin-balakrishna96
Copy link

ashwin-balakrishna96 commented Mar 28, 2024

You can try something like this. This is a lower quality open-source version of an internal model that's been working pretty well for us at TRI.

@Zhangwenyao1
Copy link

Thanks for your great work!
I want to know if this link contains the extrinsic matix of the three camera in paper?
If not,how can I get the extrinsic matrix?

@kpertsch
Copy link
Collaborator

The extrinsics information is published as part of the metadata in the droid_raw subset, see here: https://droid-dataset.github.io/droid/the-droid-dataset.html#accessing-raw-data (in the metadata-*.json files I believe)

@StarCycle
Copy link

@kpertsch @ashwin-balakrishna96 This is an example of the metadata of an epoch in the raw dataset. If I understand correctly, wrist_cam_extrinsics is only the initial extrinsic matrix of an episode (since the wrist camera is moving), right?

And thank you for propose #5 so we can get depth and intrinsics from the svo files of the raw dataset!

If possible, please add the depth, intrinsics and extrinsics to the RLDS dataset!

{"uuid": "AUTOLab+5d05c5aa+2023-07-07-09h-42m-23s", 
"lab": "AUTOLab", 
"user": "Zehan Ma",
"user_id": "5d05c5aa", 
"date": "2023-07-07", 
"timestamp": "2023-07-07-09h-42m-23s", 
"hdf5_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/trajectory.h5", 
"building": "BAIR", 
"scene_id": 5207831207, 
"success": true, 
"robot_serial": "fr3-295341-1326595", 
"r2d2_version": "1.3", 
"current_task": "Use cup to pour something granular (ex: nuts, rice, dried pasta, coffee beans)", 
"trajectory_length": 472, 
"wrist_cam_serial": "18026681", 
"ext1_cam_serial": "22008760", 
"ext2_cam_serial": "24400334", 
"wrist_cam_extrinsics": [0.26852859729525597, 0.12468921693797806, 0.38842643469874216, 2.602529290663223, -0.1020245067022938, 1.8903797737369048], 
"ext1_cam_extrinsics": [0.4039752945788883, 0.47318839256292644, 0.27170584157181743, -1.6827143529296786, 0.07550227077885108, -2.668292283962724], 
"ext2_cam_extrinsics": [0.2596757315060087, -0.36626259649963777, 0.24849304837972613, -1.742115402153725, -0.0012127426938948194, -0.7149867215760838], 
"wrist_svo_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/SVO/18026681.svo", 
"wrist_mp4_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/MP4/18026681.mp4", 
"ext1_svo_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/SVO/22008760.svo", 
"ext1_mp4_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/MP4/22008760.mp4", 
"ext2_svo_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/SVO/24400334.svo", 
"ext2_mp4_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/MP4/24400334.mp4", 
"left_mp4_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/MP4/22008760.mp4", 
"right_mp4_path": "success/2023-07-07/Fri_Jul__7_09:42:23_2023/recordings/MP4/24400334.mp4"}

@kpertsch
Copy link
Collaborator

Re adding things to RLDS: I will try to add the extrinsic info when I recompile the RLDS data! Depth increases dataset size a lot since it's not compressible and needs to be stored as float32 tensors (and thereby makes loading during training slower), so we intentionally didn't include it, but try to provide utilities for people to compute it themselves if they would like to!

@StarCycle
Copy link

StarCycle commented May 21, 2024

I can understand! But would you like to also add intrinsics to the RLDS data? @kpertsch

I want to train a policy that can adapt to any intrinsics / extrinsics. My plan is to include such info in the policy input...

@kpertsch
Copy link
Collaborator

Got it -- yeah we can add intrinsic info too

@StarCycle
Copy link

Great thanks!

@oym1994
Copy link

oym1994 commented Jul 12, 2024

Re adding things to RLDS: I will try to add the extrinsic info when I recompile the RLDS data! Depth increases dataset size a lot since it's not compressible and needs to be stored as float32 tensors (and thereby makes loading during training slower), so we intentionally didn't include it, but try to provide utilities for people to compute it themselves if they would like to!

Hi, have you finished this RLDS recompiling? I also need it!

@StarCycle
Copy link

Hi @kpertsch
One more question: Current extrinsics are for the 3rf person cameras. How to get the extrinsic of the wrist camera?

@mumianyuxin
Copy link

@StarCycle I meet the same problem. Did you solve it?

@kpertsch what is the wrist_cam_extrinsics in the metadata.json mean? Is that stand for wrist camera pose at the first frame in episodes? Is it possible to multiply the wrist_cam_extrinsics with the gripper pose of the initial frame to obtain cam2gripper, and then compute wrist_cam_extrinsics for each frame by applying matmul(cam2gripper, gripper_pose) for every frame?

Hi @kpertsch One more question: Current extrinsics are for the 3rf person cameras. How to get the extrinsic of the wrist camera?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants