Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry: Suitability of tdmpc2 for Autonomous Drone Racing #25

Closed
ErinTUDelft opened this issue Apr 2, 2024 · 6 comments
Closed

Inquiry: Suitability of tdmpc2 for Autonomous Drone Racing #25

ErinTUDelft opened this issue Apr 2, 2024 · 6 comments

Comments

@ErinTUDelft
Copy link

Dear Nicklas,

Thank you for this great library! I am currently working on my thesis about reinforcement learning for autonomous drone racing and was originally considering using Dreamerv3, but I now think that tdmpc2 is more suitable.

The observations are the position, velocity, and orientation of the quadcopter, the action space is the rpm's of the 4 rotors, and the goal is to fly through various gates as quickly as possible. Later on I will also incorporate visual input, but for now the ground-truth state information of the drone will be fed to the algorithm.

I thus wanted to ask whether you think this would be a suitable use-case for tdmpc2? Lastly, I was wondering if any work has been done on implementing this into the Nvidia Isaac Sim/Gym simulation environment?

Kind regards,
Erin

@nicklashansen
Copy link
Owner

Hi @ErinTUDelft, thanks for reaching out! This sounds like a great use case for TD-MPC2.

We don't have direct support for Nvidia Isaac Sim/Gym yet, but there's a few ongoing efforts to vectorize the algorithm. Branch episodic-rl adds support for episodic RL (episodes with variable length), branch vectorized_env adds support for vectorized environments in MuJoCo (CPU) with fixed episode length, and this commit experimentally adds support for vectorized environments in dflex with variable episode length.

I'd be interested in eventually adding support for Isaac as well but we don't have that at the moment. Using the dflex implementation as a starting point might be the easiest way forward for you. I hope this helps!

As an aside: if you are planning to deploy this on a real drone eventually, inference speed may or may not be a concern. I'd advise you to keep planning enabled during deployment if compute/latency permits, but otherwise keep in mind that disabling planning and simply using the model-free policy learned with TD-MPC2 will give you comparable inference speed to model-free algorithms like SAC/PPO potentially at the cost of some performance (might be tolerable for your problem).

@ErinTUDelft
Copy link
Author

ErinTUDelft commented Apr 10, 2024 via email

@nicklashansen
Copy link
Owner

@ErinTUDelft Oh that's great, thanks for sharing! We have not run inference on any of the systems that you mention, but inference speed should be comparable to TD-MPC1 :-) For reference, I believe we have been able to run inference at up to ~50 Hz on a workstation with a 4090 GPU.

@ErinTUDelft
Copy link
Author

Hey @nicklashansen, interesting that you haven't yet used it for inference on the Jetson series which is the predominant gpu for small robots; I mean I don't see many drones supporting a 4090 anytime soon ;) For drones especially fast inner loop control is important (~300hz) but if TDMPC is usable one layer less deep it could still be very worthwhile.

The problem the author of Omnidrones found was that the planning causes the model to train much slower than other algorithms such as PPO which we talked about here (btx0424/OmniDrones#67). I verified this statement and ran into the same problems, but thought that it could also maybe just be due to an improper implementation. That is why I'm very much looking forward to seeing your Isaac Sim TDMPC version; do you have an indication on when that would be finished?

@nicklashansen
Copy link
Owner

No ETA but I think it will be a while still. But yes I don't think you can expect off-policy algorithms in general to vectorize as well as PPO, regardless of whether you use planning or not. SAC vectorization is usually in the order of ~10 environments vs. ~16k for PPO.

@nicklashansen
Copy link
Owner

Closing this issue due to inactivity. Please freel free to reopen if any new questions should arise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants