Docker Discussion #14
Replies: 1 comment 3 replies
-
Thank you Andrej. The points you raise are very good because it tells us a different setup than the one we usually use in our team (nvidia-docker2 I basically copied from the SubT instructions). This is one of the things we were afraid of - that people come with any random setup and we're not prepared for them. I appreciate you taking the time to try things out and writing this up!
There is an
So, Docker is optional. We are using that as the default because it's much easier than asking people to install ROS Humble and Gazebo Garden themselves - which they can do, but we don't think there's enough time and WiFi bandwidth for everyone in the 1.5 hours, hence Docker. These are the latest distributions, but they are not officially supported side by side (Humble comes with Fortress). I know, I know, Yeah... You're right, it would be good to add a sentence to clarify this is the intention.
Yes! We already have a DockerHub repo created (had to go through some logistics for that) here https://hub.docker.com/r/osrf/icra2023_ros2_gz_tutorial
I'm not familiar with this flag, but we're reading here https://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container that "Please note, the flag --gpus all is used to assign all available gpus to the docker container."
For this tutorial, since it's a one day thing, we do have to trade off what will be more guaranteed to work, versus what is correct. That is not what we usually do in ROS and Gazebo development, but for the 1.5 hours to run smoothly, we have to do this to save everybody's time. I would opt for a foolproof way that works, which might be manual like with --no-nvidia, rather than doing something fancy to detect things. But it would be great if you could add it to the docker/README.md documentation as a future work or troubleshoot section!
Yeah we found this to work on Wayland (one of the organizer's setup). It is more permissive than we like, but Wayland is difficult to make work with Gazebo and Rviz, in the sense that we know people use different environment variables to make it work. We tried a couple, and decided it wasn't worth it to waste people's time on the day of to figure it out. So we opted to use a permissive way for it to work for everybody. I don't have Wayland, and this doesn't break anything for me. So it works for non-Wayland too.
Oof good point. @clalancette should we add network config to
That's our worry too. Let's add Ubuntu Jammy to the main README. We are planning on sending an email to the registered participants (with luck these are the people who show up) about system requirements. But it is much better to put it in the README. Would you mind adding a section in a PR? Probably under a new heading right after the Overview section (with quick access link at the top too) called Requirements. Docker was our solution to platforms. Mac and Windows are experimental for Gazebo natively, so Docker is already a step better. If they have an Ubuntu VM, that is the best. Otherwise Mac and Windows will be hard. Another way we're tackling this is to pair people up. That way at least they have 2 shots at having the right system. If it happens on the day that many pairs cannot get the right setup, then I'll present the tutorial so that they can at least see what's supposed to happen. Otherwise our Plan A is to let people follow the tutorial at their own pace, and I'd be a floater helping out, nobody would be talking at people when they're trying to do a tutorial. Please feel free to open PRs. Basically the rule of thumb we're leaning toward is, simplest and quickest solutions that make most people's machines work in 1.5 hours (without making it overly unsecure). Thank you!! |
Beta Was this translation helpful? Give feedback.
-
Hello,
I have some Docker-related topics for discussion after looking through the current Docker setup. Please, let me know what you think about these points and if I should create a PR to address some of them.
Installation instructions for Docker
In the current prerequisites, reader is only instructed to install something (Docker with
nvidia-docker2
) if their machine has an NVIDIA GPU. It does not really instruct any reader (with no NVIDIA GPU). Therefore, I think something more general about Docker being a requirement should be specified, as many participants might have never used Docker before.Using Docker as a non-root user
The current scripts in this repository assume that a non-root user can manage Docker (e.g.
docker run ...
instead ofsudo docker run ...
). Although it is something I always do, adding${USER}
to thedocker
group is technically an extra post-installation step that participants might skip, which would result in errors trying to run the included helper scripts. Therefore, this requirement might also need to be specified.Installation script for Docker
Not always, but I often include
.docker/host/install_docker.bash
script under my repositories just in case somebody wanted to use it. It makes surewget
is present, then installs Docker from https://get.docker.com (viawget ... | sh
). If NVIDIA GPU is detected, then eithernvidia-container-toolkit
ornvidia-docker2
is installed viaapt
based on the version of Docker. Lastly, it prompts the user whether to add${USER}
to thedocker
group [Y/n].Maybe a similar script could be added to simplify the setup for some participants? Of course, it would work only on specific systems (Debian/Ubuntu) and there might even be some exceptions there. It is not something that I use to configure my systems, so I have not tested it a lot and there might be some issues (e.g. NVIDIA GPU detector might not be 100% correct).
Pre-built Docker image
This might NOT be possible because of using
user_id
ARG in the currentDockerfile
. However, it takes 6-7 minutes to build the Docker image (with NVIDIA) on a "decent" laptop with a stable Internet connection. And this could be much higher when participants try to do it on-site on various machines with a possibly sub-optimal network connection. Building the image on-site might be alright, but I believe the setup time could be significantly reduced if the Docker image was pre-built and pushed to Docker Hub (or something similar like ghcr.io).Probably a weird and not recommended use-case, but participants wouldn't even need to clone the repository then and just
bash -c "$(wget -qO - https://raw.githubusercontent.com/osrf/icra2023_ros2_gz_tutorial/main/docker/run.bash)" -- <args>
. 😆NVIDIA GPU (
--runtime nvidia
OR--gpus all
)The current instructions and the helper run.bash script use
nvidia-docker2
with--runtime nvidia
for participants with NVIDIA GPU. From my understanding, with Docker version>19.3
,--gpus all
is enough if the system hasnvidia-container-toolkit
/nvidia-container-runtime
. Underneath,nvidia-docker2
actually depends onnvidia-container-runtime
. Therefore,--gpus all
should always work with Docker version>19.3
even if onlynvidia-docker2
was explicitly installed (whereas--runtime nvidia
won't work if onlynvidia-container-toolkit
/nvidia-container-runtime
were installed).Therefore, something like this condition could be added (L75-:L79 in my
run.bash
). As mentioned before, similar check is done ininstall_docker.bash
.Automatic detection of NVIDIA GPU
Similarly, I don't know how foolproof
$(lshw -C display 2> /dev/null | grep vendor) =~ NVIDIA
is at detecting NVIDIA GPU, but it could be used instead of explicitly passing--no-nvidia
argument. It certainly does not detect if the NVIDIA driver is installed correctly, which is needed for either option to work. So maybe not...For non-NVIDIA GPUs, I have seen something like
--device=/dev/dri
before but never tested it. Maybe it is not even required with--privileged
option that is used right now?Is
xhost +
necessary for GUI because of Wayland?I noticed #11 and I don't use Wayland, so I won't go into depth. But I am wondering if
xhost +
is needed only for Wayland (because I see-e XAUTHORITY=$XAUTH
and other options that usually work for me on their own).Isolation of ROS 2 communication and Gazebo transport
As all participants might be on the same local network, maybe there is a need to partition/isolate their communication as a precaution. I am not exactly sure what is the proper way, but maybe something like this?
I remember having an issue with similar isolation because of disabled multicast on the loopback network interface, so something like
ifconfig lo multicast
might also be necessary?Hardware and OS of participants
Now for the difficult point. Even with Docker, a lot of the setup might still be restricted to Linux (or even Ubuntu). The last time I tried to run Gazebo/Rviz under Docker on Windows 10 (Docker Desktop, not WSL 1/2), I couldn't get GUI working. Maybe it is different with WSL now, but I have not tried (and not sure how long it would take to set it up). Maybe MacOS works fine (x86_64/arm64)? But even on Ubuntu, something like broken NVIDIA drivers could prevent the GUI from working for some reason, or specific hardware could negatively impact rendering or something else. Are the participants recommended to bring some specific system, and do you have some insights/suggestions on how to tackle this?
Beta Was this translation helpful? Give feedback.
All reactions