Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for containers running on macvlan networks #96

Open
lox opened this issue May 14, 2023 · 6 comments
Open

Support for containers running on macvlan networks #96

lox opened this issue May 14, 2023 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@lox
Copy link

lox commented May 14, 2023

I have containers that run on macvlan networks (to allow broadcast/multicast easily). I'd love to be able to use whalewall to firewall off web admin ports, but allow access to things like DNS.

An example might be adguard, which needs 53/tcp and 53/udp accessible, but I'd like to limit access to 80/443 to only a specific network (traefik vs the macvlan network).

Might something like that be in-scope for whalewall?

@capnspacehook
Copy link
Owner

capnspacehook commented May 14, 2023

I've never used macvlan networks, but that should be possible now. Try specifying the network for your output rules you want to limit to a single network and let me know if that works.

@capnspacehook capnspacehook added the question Further information is requested label May 14, 2023
@lox
Copy link
Author

lox commented May 14, 2023

Interesting, looks like macvlan containers aren't subject to a lot of the standard iptables chains and need something like this: https://github.com/deitch/ctables

@capnspacehook capnspacehook added enhancement New feature or request and removed question Further information is requested labels Jun 4, 2023
@capnspacehook capnspacehook added the help wanted Extra attention is needed label Jun 17, 2023
@esev
Copy link

esev commented Aug 8, 2023

I've been looking into something similar and discovered whalewall. I wonder if it'd be possible to move all the rules into the container's network namespace. The NetworkSettings.SandboxKey, from Docker inspect, seems to point to a file related to the network namespace. I haven't experimented enough yet, but I wonder if opening that file is all that would be needed to use it with nftables.WithNetNSFd.

Bonus: no cleanup would be needed when a container exits, as all the rules should go away along with the namespace when the namespace is destroyed.

@capnspacehook
Copy link
Owner

This should be possible, the only downside is the nftables rules could be viewed or modified by the root user in the Docker container, since the rules would be in the container's namespace instead of the host's.

@esev
Copy link

esev commented Aug 23, 2023

the only downside is the nftables rules could be viewed or modified by the root user in the Docker container,

I think this would only be possible for containers with the NET_ADMIN capability. But I agree, the more generic solution would be the one implemented today.

@Jip-Hop
Copy link

Jip-Hop commented Feb 7, 2024

Bonus: no cleanup would be needed when a container exits, as all the rules should go away along with the namespace when the namespace is destroyed.

I did a little experiment with firewall rules in the container namespace. It is inspired by how kubernetes/podman pods work.

Upsides:

  1. No need to mount docker socket
  2. No docker polling/event listening
  3. Rules are guaranteed to be setup before the restricted container starts
  4. Plain nftables rules
  5. Cleans up after itself

Downsides:

  1. More yaml
  2. Extra containers (1 stopped + 1 running per network namespace)
services:

  pod:
    # Long running process to keep the network namespace alive
    # Stopping (or restarting) this container will kill the network in dependent containers
    command: sleep infinity
    image: alpine
    # Use a non-root user
    user: 1000:1000
    # Use init, as sleep isn't intended to run as pid 1
    # https://daveiscoding.com/why-do-you-need-an-init-process-inside-your-docker-container-pid-1
    init: true
    # Drop all capabilities and make read-only, this container does nothing
    cap_drop:
      - ALL
    read_only: true
    # Bonus: block DNS tunneling by disabling DNS forwarding (forward to 0.0.0.0)
    # Containers can still resolve hostnames of containers on the same network (e.g. hostname_demo),
    # but they can't resolve public domains such as google.com
    # However, this also effects /etc/resolv.conf inside the firewall container
    # https://github.com/moby/moby/issues/19474#issuecomment-276406305
    dns: 0.0.0.0

  firewall:
    # Use firewalld, or any other image with nftables installed
    image: quay.io/firewalld/firewalld
    # Run as root user
    user: 0:0
    # Give only the required NET_ADMIN capability
    cap_drop:
      - ALL
    cap_add: 
      - NET_ADMIN
    depends_on:
      pod:
        # Compose should restart the firewall after it updates pod
        # This applies to explicit restart controlled by a Compose operation only!
        restart: true
        condition: service_started
    # Join the network namespace of pod
    network_mode: service:pod
    command: >
      sh -c "
            echo 'Setup a new table and chain...' &&
            nft add table inet filter &&
            nft add chain inet filter output { type filter hook output priority 0\\; } &&
            
            echo 'Allow private network ranges...' &&
            nft add rule inet filter output ip daddr 10.0.0.0-10.255.255.255 accept &&
            nft add rule inet filter output ip daddr 172.16.0.0-172.31.255.255 accept &&
            nft add rule inet filter output ip daddr 192.168.0.0-192.168.255.255 accept &&
            nft add rule inet filter output ip daddr 127.0.0.0-127.255.255.255 accept &&
            
            echo 'Allow 1.1.1.1 as example...' &&
            nft add rule inet filter output ip daddr 1.1.1.1 accept &&
            
            echo 'Drop connections to all other IP addresses...' &&
            nft add rule inet filter output drop &&

            nft list table inet filter
          "

  restricted:
    image: alpine
    # Start this container once the firewall has been setup
    depends_on:
      firewall:
        restart: true
        condition: service_completed_successfully
    network_mode: service:pod
    # Run as regular user
    user: 1000:1000
    init: true
    # Without any capabilities
    cap_drop:
      - ALL
    # Show the firewall is effective
    command: >
      sh -c "
            echo 'Resolving internal hostname works:'
            ping hostname_demo -c 1
            echo
            echo 'Resolving public domain fails:'
            wget -T 1 -t 1 google.com -O - | head
            echo
            echo 'Accessing allowed IP address works:'
            wget 1.1.1.1 -O - | head -n 21 | tail -n 17
            echo
            echo 'Accessing any other IP address fails:'
            wget -T 1 -t 1 40.89.244.232 -O - | head
            exit 0
          "

  # Bonus: another container on the same network to demo resolving internal DNS names
  hostname_demo:
    image: alpine
    user: 1000:1000
    init: true
    cap_drop:
      - ALL
    read_only: true
    command: sleep infinity

  # Bonus: try wiping the firewall rules as root user
  try_firewall_wipe:
    image: quay.io/firewalld/firewalld
    depends_on:
      restricted:
        restart: true
        condition: service_completed_successfully
    network_mode: service:pod
    user: 0:0
    init: true
    cap_drop:
      - ALL
    # Root can't wipe the firewall rules without the NET_ADMIN capability
    command: >
      sh -c "
            echo 'Try wiping the firewall as root user...'
            nft flush ruleset
            echo 'Demo is finished!'
          "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants