Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host networking fails due to legacy iptables config #443

Closed
lukechilds opened this issue Sep 3, 2023 · 3 comments
Closed

Host networking fails due to legacy iptables config #443

lukechilds opened this issue Sep 3, 2023 · 3 comments

Comments

@lukechilds
Copy link

lukechilds commented Sep 3, 2023

If you want to run a second Docker daemon via dind with host networking access it doesn't currently work because almost all distros use the modern nftables backend for iptables and for some reason Alpine (which this dind image is based on) defaults to the legacy backend. If you run docker run --privileged --net host docker:dind that Docker instance will be using a totally different iptables backend than the host, the two will conflict, and containers run inside the dind instance with ports exposed will not be accessible on the host unless you manually update the host's iptables config.

You can see an example of this here:

# Host
ubuntu@host:~$ sudo iptables --version
iptables v1.8.7 (nf_tables)
ubuntu@host:~$ sudo iptables -S                                                                                                         
-P INPUT ACCEPT                                                                                                                         
-P FORWARD DROP                                                                                                                         
-P OUTPUT ACCEPT                                                                                                                        
-N DOCKER                                                                                                                               
-N DOCKER-ISOLATION-STAGE-1                                                                                                             
-N DOCKER-ISOLATION-STAGE-2                                                                                                             
-N DOCKER-USER                                                                                                                          
-A FORWARD -j DOCKER-USER                                                                                                               
-A FORWARD -j DOCKER-ISOLATION-STAGE-1                                                                                                  
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT                                                              
-A FORWARD -o docker0 -j DOCKER                                                                                                         
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT                                                                                            
-A FORWARD -i docker0 -o docker0 -j ACCEPT                                                                                              
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2                                                         
-A DOCKER-ISOLATION-STAGE-1 -j RETURN                                                                                                   
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP                                                                                          
-A DOCKER-ISOLATION-STAGE-2 -j RETURN                                                                                                   
-A DOCKER-USER -j RETURN

# docker:dind
ubuntu@host:~$ sudo docker run --privileged --net host docker:dind iptables --version
iptables v1.8.9 (legacy)
ubuntu@host:~$ sudo docker run --privileged --net host docker:dind iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

Notice how the dind container is not seeing the hosts iptables chains despite having the appropriate privileges.

Luckily Alpine also includes the nftables variant as iptables-nft which when called from in a container with host networking works as expected and interfaces with the hosts actual iptables backend:

ubuntu@host:~$ sudo docker run --privileged --net host docker:dind iptables-nft --version
iptables v1.8.9 (nf_tables)
ubuntu@host:~$ sudo docker run --privileged --net host docker:dind iptables-nft -S
# Warning: iptables-legacy tables present, use iptables-legacy to see them
-P INPUT ACCEPT                                                                                                                         
-P FORWARD DROP                                                                                                                         
-P OUTPUT ACCEPT                                                                                                                        
-N DOCKER                                                                                                                               
-N DOCKER-ISOLATION-STAGE-1                                                                                                             
-N DOCKER-ISOLATION-STAGE-2                                                                                                             
-N DOCKER-USER                                                                                                                          
-A FORWARD -j DOCKER-USER                                                                                                               
-A FORWARD -j DOCKER-ISOLATION-STAGE-1                                                                                                  
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT                                                              
-A FORWARD -o docker0 -j DOCKER                                                                                                         
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT                                                                                            
-A FORWARD -i docker0 -o docker0 -j ACCEPT                                                                                              
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2                                                         
-A DOCKER-ISOLATION-STAGE-1 -j RETURN                                                                                                   
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP                                                                                          
-A DOCKER-ISOLATION-STAGE-2 -j RETURN                                                                                                   
-A DOCKER-USER -j RETURN

The two variants are namespaced and the default is set via symlinks:

ubuntu@host:~$ sudo docker run docker:dind ls -lah /sbin/iptables*
lrwxrwxrwx    1 root     root          23 Sep  3 10:23 /sbin/iptables -> /sbin/xtables-nft-multi
-rwxr-xr-x    1 root     root        6.9K Apr 19 17:59 /sbin/iptables-apply
lrwxrwxrwx    1 root     root          20 Sep  1 01:53 /sbin/iptables-legacy -> xtables-legacy-multi
lrwxrwxrwx    1 root     root          20 Sep  1 01:53 /sbin/iptables-legacy-restore -> xtables-legacy-multi
lrwxrwxrwx    1 root     root          20 Sep  1 01:53 /sbin/iptables-legacy-save -> xtables-legacy-multi
lrwxrwxrwx    1 root     root          17 Sep  1 01:53 /sbin/iptables-nft -> xtables-nft-multi
lrwxrwxrwx    1 root     root          17 Sep  1 01:53 /sbin/iptables-nft-restore -> xtables-nft-multi
lrwxrwxrwx    1 root     root          17 Sep  1 01:53 /sbin/iptables-nft-save -> xtables-nft-multi
lrwxrwxrwx    1 root     root          23 Sep  3 10:23 /sbin/iptables-restore -> /sbin/xtables-nft-multi
lrwxrwxrwx    1 root     root          23 Sep  3 10:23 /sbin/iptables-restore-translate -> /sbin/xtables-nft-multi
lrwxrwxrwx    1 root     root          23 Sep  3 10:23 /sbin/iptables-save -> /sbin/xtables-nft-multi
lrwxrwxrwx    1 root     root          23 Sep  3 10:23 /sbin/iptables-translate -> /sbin/xtables-nft-multi

If we simply update the default iptables symlinks to point to the nftables variants dind with --net host is able to properly update the host iptables:

for command in iptables iptables-restore iptables-restore-translate iptables-save iptables-translate
do 
  ln -sf /sbin/xtables-nft-multi /sbin/$command
done

Here is a fully working example of Portainer + Docker running in Docker with portainer able to create new Docker contianers that have ports exposed on the host.

I also added a DOCKER_ENSURE_BRIDGE env var the sets up a separate bridge network from within the container so it doesn't conflict with the hosts docker0 network.

docker-compose.yml

version: "3.7"

services:
  dind:
    image: docker:24.0.5-dind
    privileged: true
    network_mode: host
    environment:
      DOCKER_ENSURE_BRIDGE: "dind0:10.32.0.1/16"
    entrypoint: /entrypoint.sh
    command: >
      dockerd
        --bridge dind0
        --data-root /data/data
        --exec-root /data/exec
        --host unix:///data/docker.sock
        --pidfile /data/docker.pid
    volumes:
      - ./entrypoint.sh:/entrypoint.sh
      - ./data/docker:/data

  portainer:
    image: portainer/portainer-ce:2.19.0
    command: -H unix:///docker-data/docker.sock
    ports:
      - "9000:9000"
    volumes:
      - ./data/portainer:/data
      - ./data/docker:/docker-data

entrypoint.sh

#!/bin/sh

# Use nftables as the backend for iptables
for command in iptables iptables-restore iptables-restore-translate iptables-save iptables-translate
do 
    ln -sf /sbin/xtables-nft-multi /sbin/$command
done

# Ensure that a bridge exists with the given name and IP range specified in the format "name:ip_range" e.g. "dind0:10.32.0.1/16"
if [ -n "${DOCKER_ENSURE_BRIDGE:-}" ]
then
  bridge="${DOCKER_ENSURE_BRIDGE%%:*}"
  ip_range="${DOCKER_ENSURE_BRIDGE#*:}"
  if ! ip link show "${bridge}" &>/dev/null
    then
      ip link add "${bridge}" type bridge
      ip addr add "${ip_range}" dev "${bridge}"
      ip link set "${bridge}" up
    fi
    echo "Bridge ${bridge} exists:"
    ip addr show "${bridge}"
fi

exec dockerd-entrypoint.sh $@

I've hacked this together by injecting a new entrypoint in a volume that wraps the current entrypoint. This is working for our use case but it would be great if the official image handled this. Would you accept a PR to update the iptables backend to nftables directly in this Dockerfile?

If so would you also be interested in adding the DOCKER_ENSURE_BRIDGE utility to current entrypoint?

Thanks!

@lukechilds
Copy link
Author

@tianon I added a PR here if it's useful: #444

@tianon
Copy link
Member

tianon commented Dec 11, 2023

See #437 (comment)

@tianon
Copy link
Member

tianon commented Dec 13, 2023

Closing in favor of #437 -- unfortunately, more than that is going to be out of scope because I honestly wouldn't recommend using Docker-in-Docker with --network=host, but you do seem savvy enough to understand the risks 😅

@tianon tianon closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
@tianon @lukechilds and others