Idea: network isolation between tests #1475

yanns · 2024-04-30T09:03:57Z

use-case

My tests are starting different servers, picking TCP a free port randomly. I first need to pick a port, and then run the http server as the port is being used is some shared configuration. I cannot "simply" use the 120.0.0.1:0 approach. I'm using port-selector for that. I could also use reserve-port but it does not help when running with nextest.

When running with nextest, as tests are running in different processes in parallel, it can happen that 2 tests pick the same port.

My current mitigation is to use the retry mechanism to restart the tests failing because they pick identical ports. The tests themselves are not flaky, but running them in parallel make them flaky.

Idea

One possible idea would be to use network isolation on linux, so that each process can pick same ports without conflict.

Possible issues

Network isolation only works on Linux kernel ( I haven't checked other OSs).
It adds complexity, in the code of nextest. And it can also add complexity on how to configure those namespaces, if they need to access internet...
Creating new namespaces requires privileges

The text was updated successfully, but these errors were encountered:

NobodyXu · 2024-04-30T09:21:32Z

Creating new namespaces requires privileges

On linux, if you have unprivileged user namespace, then you could create namespace and all those isolation without privileges.

sunshowers · 2024-05-06T18:48:24Z

Hi --

The immediate problem you have is probably easiest to solve via test groups: https://nexte.st/book/test-groups

For network isolation, I'm not completely opposed to it, and there are related issues where something other than execing the test process would make sense. (For example, see #1371.)

But I don't know how complex it's going to be, especially handling all the different failure modes (what if the system doesn't allow creating unprivileged user namespaces?)
Are there alternatives like systemd-run --user --scope (which creates a cgroup) that can be helpful here?

I think the best way to get started would be by prototyping your solution using a target runner: https://nexte.st/book/target-runners. A target runner is a custom script or binary that gets invoked separately for each test. Hopefully it should be possible to build a proof of concept with that.

NobodyXu · 2024-05-07T00:16:21Z

I think systemd-nspawn can be used, to sandbox test on Linux.

It should be fairly easy with no change required to nextest, it supports --as-pid2 so that systemd-nspawn would run as pid1 inside container and reaping children, while nextest will run as pid2.

It also has options for dealing with networking, setting up private network namespace via --private-network and setting up network bridges, etc.

Though it looks like you would have to use --bind-ro to manually mount stuff into the container.

sunshowers · 2024-05-26T23:44:15Z

Thanks @NobodyXu. I think testing these strategies out in a target runner would be the best next step. I don't have the time to work on this myself, but I'll open the floor to contributions.

yanns · 2024-05-27T16:28:41Z

For info, it seems that https://maelstrom-software.com/ embraces the idea of one container per test. I had no opportunity to try it out yet.

yanns · 2024-05-28T14:13:22Z

I've played a bit with systemd-nspawn, but I could not manage starting it with a normal user (not root)

NobodyXu · 2024-05-28T14:45:33Z

I've played a bit with systemd-nspawn, but I could not manage starting it with a normal user (not root)

I think you would need to enable user namespace and use it?

yanns · 2024-05-28T15:46:55Z

I've played a bit with systemd-nspawn, but I could not manage starting it with a normal user (not root)

I think you would need to enable user namespace and use it?

I'm trying, but without success:

$ systemd-nspawn  --private-users=yes --private-users-ownership=auto --as-pid2 'echo hello'
Need to be root.

NobodyXu · 2024-05-29T03:37:01Z

According to https://wiki.archlinux.org/title/systemd-nspawn#Unprivileged_containers , systemd-nspawn supports unprivileged container, but it has to spawn by root.

So I was wrong about that

sunshowers · 2024-05-29T03:51:08Z

Is systemd-run an option? I thought I could get it to work as a user.

NobodyXu · 2024-05-29T04:05:31Z

I think podman might be another option, it supports non-root mode, doesn't have to root to create an unprivileged container

NobodyXu · 2024-05-29T04:06:00Z

Or you could also try https://firejail.wordpress.com/

yanns · 2024-05-29T07:35:17Z

We could also check how https://maelstrom-software.com/ is doing it.

sunshowers · 2024-05-29T20:39:40Z

Or you could also try https://firejail.wordpress.com/

Oh this is good, I've used firejail and it works very well. I think this would be great as part of a library of target runners.

PegasusPlusUS · 2024-11-04T20:52:59Z

Hello, I see 'Help wanted' tag and just read this article. I have a simple method for TCP listen port selection, and I just write a simple program to test it works OK.

My method is to first connect to somewhere, a server at LAN, localhost, or 8.8.8.8:53(TCP DNS query), the OS will help each client connection get a unique port, usually clients just connect to outside, waste the resource a bit, :). The fact is that client can also use the port to listen, Thus each test can have their own listen port, will not conflict with each other.

Here is the simple verification python program:

import socket

def start_client():
    # Create a socket and connect to the server
    client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client_socket.connect(('8.8.8.8', 53))
    local_port = client_socket.getsockname()[1]
    print(f"Connected to server, local port is {local_port}")

    # Attempt to bind and listen on the same local port
    try:
        client_bind_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        client_bind_socket.bind(('localhost', local_port))
        client_bind_socket.listen(1)
        print(f"Client is now listening on local port {local_port}")
    except Exception as e:
        print(f"Failed to listen on local port {local_port}: {e}")

    client_socket.close()

if __name__ == "__main__":
    start_client()

sunshowers added the help wanted Extra attention is needed label May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: network isolation between tests #1475

Idea: network isolation between tests #1475

yanns commented Apr 30, 2024

NobodyXu commented Apr 30, 2024

sunshowers commented May 6, 2024 •

edited

Loading

NobodyXu commented May 7, 2024

sunshowers commented May 26, 2024 •

edited

Loading

yanns commented May 27, 2024

yanns commented May 28, 2024

NobodyXu commented May 28, 2024

yanns commented May 28, 2024

NobodyXu commented May 29, 2024

sunshowers commented May 29, 2024

NobodyXu commented May 29, 2024

NobodyXu commented May 29, 2024

yanns commented May 29, 2024

sunshowers commented May 29, 2024

PegasusPlusUS commented Nov 4, 2024

Idea: network isolation between tests #1475

Idea: network isolation between tests #1475

Comments

yanns commented Apr 30, 2024

use-case

Idea

Possible issues

NobodyXu commented Apr 30, 2024

sunshowers commented May 6, 2024 • edited Loading

NobodyXu commented May 7, 2024

sunshowers commented May 26, 2024 • edited Loading

yanns commented May 27, 2024

yanns commented May 28, 2024

NobodyXu commented May 28, 2024

yanns commented May 28, 2024

NobodyXu commented May 29, 2024

sunshowers commented May 29, 2024

NobodyXu commented May 29, 2024

NobodyXu commented May 29, 2024

yanns commented May 29, 2024

sunshowers commented May 29, 2024

PegasusPlusUS commented Nov 4, 2024

sunshowers commented May 6, 2024 •

edited

Loading

sunshowers commented May 26, 2024 •

edited

Loading