Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is gVisor? -- explain where runsc is at in the diagram... #11088

Open
JustinCappos opened this issue Oct 29, 2024 · 6 comments
Open

What is gVisor? -- explain where runsc is at in the diagram... #11088

JustinCappos opened this issue Oct 29, 2024 · 6 comments
Assignees

Comments

@JustinCappos
Copy link

I'm trying to dive deeper and understand the project. The diagram above "What is runsc?" shows a set of processes that gVisor starts but it doesn't show runsc. Does this start and then exit? Where would this be in the diagram?

On a related note, I'm curious to understand what the diagram would look like after fork + exec occurs. I'm curious to understand where these calls are made and what happens in this case.

(I'll look at the codebase to try to understand this, but other newbs might have similar confusion.)

@manninglucas manninglucas self-assigned this Oct 29, 2024
@manninglucas
Copy link
Contributor

manninglucas commented Oct 29, 2024

Hi @JustinCappos, good question! We've been meaning to update that page for sometime, but this is a good prompt to actually get it done :). While we work on that, you may be interested in the slides from a talk I gave earlier this year that have more in depth diagrams illustrating how gVisor works.

@EtiennePerot
Copy link
Contributor

EtiennePerot commented Oct 30, 2024

Hi @JustinCappos. On top of the link @manninglucas provided, I also suggest looking at a recent blog post from Dangerzone on its integration of gVisor. While it doesn't map to the typical gVisor use-case of using runsc as a container runtime, it does go into details as to what processes exist in the steady state once the gVisor sandbox is created, with cute diagrams too.

To answer your question more precisely: runsc is an entrypoint and management command-line for OCI-managed containers, following the OCI runtime spec. When starting a new sandbox, the actual initial runsc create or runsc start processes are not long-lived, as per the spec. They do fork and re-execute the runsc binary several times as part of this procedure. The 2 "steady-state" processes running in the background after runsc start terminates are runsc-sandbox and runsc-gofer, which are both running subcommands of the runsc binary that perform their individual functions.

@JustinCappos
Copy link
Author

Thanks, I took a look. This really helps!

In the article I see:

However, it’s heavily restricted with a strict seccomp filter (that’s why system calls like open, socket, or exec are not allowed).

One question I had is how system calls which must be applied to the local process are implemented. I totally understand how proxying I/O works to Gofer, but what if the application running wants to fork, exec, etc.? I would think this needs to be done by the isolated process directly but from what I understand from reading the article, those calls are blocked by the Sentry...

@EtiennePerot
Copy link
Contributor

EtiennePerot commented Oct 31, 2024

If the sandboxed application execs, they do so only within the context of the gVisor kernel, not the host kernel. gVisor by itself implements the logic necessary for the sandboxed application to think it has exec'd, without any exec system call actually happening on the host. Each "process" exists only as a logical concept within the gVisor kernel's data structures, not on the host Linux kernel.

You can try it out by doing something like this:

# Start a gVisor sandbox
$ docker run --rm -it --runtime=runsc ubuntu bash

# Fork/exec two subprocesses within the sandbox
root@385653ae75e2:/# sleep 5m &
[1] 4
root@385653ae75e2:/# sleep 6m &
[2] 5

# See them running within the sandbox
root@385653ae75e2:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.3  0.0  12696  5052 ?        Ss   01:15   0:00 bash
root         4  0.0  0.0  10860  4532 ?        S    01:15   0:00 sleep 5m
root         5  0.0  0.0  10860  2976 ?        S    01:15   0:00 sleep 6m
root         6  0.0  0.0  15132  7020 ?        R    01:15   0:00 ps aux

Then, while this is running, you can run this outside the sandbox (directly on the host):

# The "sleep" processes do not exist from the host Linux kernel's perspective:
$ ps aux | grep sleep
eperot    447973  0.0  0.0   9244  2080 pts/5    S+   21:15   0:00 grep --color=auto sleep

# Only the `runsc` processes exist, from the host Linux kernel's perspective:
$ ps aux | grep runsc
eperot    447192  0.0  0.0 1782440 39596 pts/3   Sl+  21:15   0:00 docker run --rm -it --runtime=runsc ubuntu bash
root      447245  0.0  0.0 1255128 24484 ?       Ssl  21:15   0:00 runsc-gofer --systemd-cgroup=true --root=/var/run/docker/runtime-runc/moby --log=/run/containerd/io.containerd.runtime.v2.task/moby/385653ae75e22065d3385130b54353f6761febf10c658a09d7fe6ec06a06a6cf/log.json --log-format=json --log-fd=3 gofer --bundle=/run/containerd/io.containerd.runtime.v2.task/moby/385653ae75e22065d3385130b54353f6761febf10c658a09d7fe6ec06a06a6cf --gofer-mount-confs=lisafs:self,lisafs:none,lisafs:none,lisafs:none --io-fds=6,7,8,9 --mounts-fd=5 --spec-fd=4 --sync-nvproxy-fd=-1 --sync-userns-fd=-1 --proc-mount-sync-fd=16 --apply-caps=false --setup-root=false
root      447250  9.8  0.0 3466188 38284 pts/4   Ssl+ 21:15   0:10 runsc-sandbox --root=/var/run/docker/runtime-runc/moby --log=/run/containerd/io.containerd.runtime.v2.task/moby/385653ae75e22065d3385130b54353f6761febf10c658a09d7fe6ec06a06a6cf/log.json --log-format=json --systemd-cgroup=true --log-fd=3 boot --apply-caps=false --bundle=/run/containerd/io.containerd.runtime.v2.task/moby/385653ae75e22065d3385130b54353f6761febf10c658a09d7fe6ec06a06a6cf --controller-fd=11 --cpu-num=20 --dev-io-fd=-1 --gofer-filestore-fds=8 --gofer-mount-confs=lisafs:self,lisafs:none,lisafs:none,lisafs:none --io-fds=4,5,6,7 --mounts-fd=9 --setup-root=false --spec-fd=12 --start-sync-fd=10 --stdio-fds=13,14,15 --total-host-memory=67078000640 --total-memory=67078000640 --product-name=21DDS2VM00 --host-shmem-huge=never --proc-mount-sync-fd=23 385653ae75e22065d3385130b54353f6761febf10c658a09d7fe6ec06a06a6cf
eperot    448358  0.0  0.0   9244  2080 pts/5    S+   21:16   0:00 grep --color=auto runsc

@rennergade
Copy link

Thanks for the replies @EtiennePerot @manninglucas

I'm a student working with Justin. Had a few more questions. Is gvisor using SFI to run multiple applications in the same process? Are there any diagrams specifically about fork, or can you point us to the source code?

Thanks!

@EtiennePerot
Copy link
Contributor

EtiennePerot commented Oct 31, 2024

fork (or more specifically clone(2) without the CLONE_VM flag, based on the question) is implemented in a platform-specific manner. Look at the Platform.NewAddressSpace and the AddressSpace interface for each platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants