Help me understand the underlying file system of kurtosis on kubernetes #946

barnabasbusa · 2023-07-19T14:40:23Z

barnabasbusa
Jul 19, 2023
Maintainer

I'm curious what the underlying file system is like of kurtosis.
When I deploy a new enclave, I would expect a local path provisioner to take care of the creation of a new persistent volume (pv) and a persistent volume claim (pvc). However, neither of these happen, so I'm curious where is the actual files are stored and how can one retrieve them?

In case these files don't live in pvs, as far as I know they are living in /tmp and would not survive a reboot.

Also it seems to me that each process is just spun up in a pod, instead of statefulset. This way the process can't be rescheduled into other pods, nor can it be 'retrieved' after its been killed easily. However, this is another question, is outside of the scope of this question.

Answered by mieubrisse

Jul 19, 2023

Hey @barnabasbusa , great question! I'll first give some history, and then explain how it's led us to the current world.

When Kurtosis started, we were focused on ephemeral testing environments in CI on Docker. Users would create one enclave per test, run the test, and then throw away the enclave. We also focused very heavily on reproducibility, since we've experienced firsthand the hell of flaky tests. This meant that a single Git commit would have all the resources (including test logic & files) necessary to run the test, and the test itself would be described in the predecessor to Kurtosis Starlark. In this world, persistence wasn't necessary: we'd load all the resources in the exact s…

View full answer

mieubrisse · 2023-07-19T19:55:06Z

mieubrisse
Jul 19, 2023
Maintainer

Hey @barnabasbusa , great question! I'll first give some history, and then explain how it's led us to the current world.

When Kurtosis started, we were focused on ephemeral testing environments in CI on Docker. Users would create one enclave per test, run the test, and then throw away the enclave. We also focused very heavily on reproducibility, since we've experienced firsthand the hell of flaky tests. This meant that a single Git commit would have all the resources (including test logic & files) necessary to run the test, and the test itself would be described in the predecessor to Kurtosis Starlark. In this world, persistence wasn't necessary: we'd load all the resources in the exact same way for each test, and throw everything away when the test was done. (this is also why enclaves aren't restartable, as you correctly mentioned in #853 - it wasn't much needed at the time)

Kurtosis moves files around through files artifacts, which are basically just TGZs coordinated by the Kurtosis API container running inside the enclave. In Docker, we just write these TGZs to the API container's container filesystem (since the container filesystem persists through restarts). When we added Kubernetes support, it seemed like the analog was to have a Persistent Volume for each APIC.

However, we quickly hit limits using DigitalOcean's managed k8s: PVs would take on the order of minutes to create on-demand, and you were limited to 10 of them. We then learned that Kubernetes seems to want a small number of PersistentVolumes, and lots of PersistentVolumeClaims made on top of them, so we switched to the user configuring a single PV for all of Kurtosis, and each APIC making a PVC for a subpath of the PV to store the files artifacts.

But Kurtosis was still mostly used for testing usecases, and we realized that the requirement to bring a PV was providing more friction than benefit (the user had to create the PV, and then plug it into the Kurtosis config) so we simply had the APIC write it to its container filesystem. This was simpler, but has exactly the downside you described: those files won't survive a reboot.

That was fine in the world where Kurtosis is used for ephemeral test enclaves, but since then Kurtosis usage has started to expand upstream to dev/prototyping (using a Kurtosis enclave to spin up resources in an enclave that will live for a while, and continually iterating against it) and downstream towards prod (spin up an eternal enclave that is receiving constant updates and has durable data). Which basically means that the product needs to advance!

We've been anticipating and planning for this, and this is what's going to be happening in the next several months to cope:

APICs will once again get a persistent file storage (a volume in Docker, and a PVC on top of a PV in k8s). For k8s this will require provisioning a PV, which we hope to abstract away from the user with our Kurtosis Cloud offering that we're working on right now.
The metadata the APIC tracks about the enclave will get moved out of memory and into a database whose files reside in the persistent file storage (meaning the APIC will be restartable, which is the big blocker to fixing Unable to restart stopped enclave #853 )
Kurtosis will gain a notion of durable user data (basically volumes and PVCs) that persists even through container death (necessary to ever run a Prod workflow in Kurtosis)
We'll likely build a way to persist & use data across enclaves, as this has been requested many times for the dev usecase ("I want to run a database in my enclave for dev, kill the enclave, and restart it". This persistence aspect is a lot of what Docker bind mounts are used for)

Lastly, you mentioned, "how can I explore the Kurtosis filesystem?" If you wanted to go very deep, you could shell into the APIC itself, though we don't recommend doing this because it's basically messing with the internals of Kurtosis. The much cleaner way would be kurtosis enclave inspect to see the files artifacts inside the enclave, kurtosis files inspect to inspect the contents of a given files artifact, and kurtosis files download to download a files artifact from the enclave.

Example of kurtosis enclave inspect after kurtosis run github.com/kurtosis-tech/eth-network-package:

Name:            callous-field
UUID:            183d7abcd825
Status:          RUNNING
Creation Time:   Wed, 19 Jul 2023 16:51:42 -03

========================================= Files Artifacts =========================================
UUID           Name
c22e3ea7c753   cl-genesis-data
7c207676073b   el-genesis-data
c7ee9d3b9017   genesis-generation-config-cl
7245dd33459c   genesis-generation-config-el
d9671ac97188   geth-prefunded-keys
ec03ad682dff   prysm-password
9c1a4d0c2b64   validator-keystore-0

========================================== User Services ==========================================
UUID           Name                                           Ports                                         Status
8532561b3268   cl-1-lighthouse-geth                           http: 4000/tcp -> http://127.0.0.1:53864      RUNNING
                                                              metrics: 5054/tcp -> http://127.0.0.1:53862
                                                              tcp-discovery: 9000/tcp -> 127.0.0.1:53863
                                                              udp-discovery: 9000/udp -> 127.0.0.1:49790
c0291f91f82b   cl-1-lighthouse-geth-validator                 http: 5042/tcp -> 127.0.0.1:53865             RUNNING
                                                              metrics: 5064/tcp -> http://127.0.0.1:53866
fef16c9d5ca6   el-1-geth-lighthouse                           engine-rpc: 8551/tcp -> 127.0.0.1:53855       RUNNING
                                                              rpc: 8545/tcp -> 127.0.0.1:53853
                                                              tcp-discovery: 30303/tcp -> 127.0.0.1:53856
                                                              udp-discovery: 30303/udp -> 127.0.0.1:49441
                                                              ws: 8546/tcp -> 127.0.0.1:53854
64f6b3aa8c66   prelaunch-data-generator-1689796305636701961   <none>                                        STOPPED
c80065e8a208   prelaunch-data-generator-1689796305652370336   <none>                                        STOPPED
ce09fcfdaf16   prelaunch-data-generator-1689796305660664419   <none>                                        STOPPED

Example of kurtosis files inspect callous-field el-genesis-data:

INFO[2023-07-19T16:53:12-03:00] Artifact 'el-genesis-data' contents:
el-genesis-data
└── output
    ├── besu.json [38.6K]
    ├── erigon.json [38.5K]
    ├── genesis.json [38.5K]
    ├── jwtsecret [  66]
    ├── nethermind.json [39.8K]
    └── trusted_setup.txt [400.3K]

Example of inspecting a file inside the files artifact via kurtosis files inspect callous-field el-genesis-data output/jwtsecret:

INFO[2023-07-19T16:54:13-03:00] File contents:
0x2ae5f35d91ce179ceeb50160d56004c6fc0cff8ac08bc31c77dc8e403e587b44

If this answered your question, would you mind marking this as the answer? Thanks!

0 replies

mieubrisse · 2023-07-19T20:01:38Z

mieubrisse
Jul 19, 2023
Maintainer

And created #949 to track your follow-on question!

0 replies

barnabasbusa · 2023-07-20T09:59:33Z

barnabasbusa
Jul 20, 2023
Maintainer Author

In Docker, we just write these TGZs to the API container's container filesystem (since the container filesystem persists through restarts).

Yes, I can confirm, that these are located on a local machine's computer at path /var/lib/docker/volumes/files-artifact-expansion--97c6f83cfccf4e629be7cd315e2c4d1f--4a525439b6f541ffbde9b2a1642daff9/_data/output or something similar.

We then learned that Kubernetes seems to want a small number of PersistentVolumes, and lots of PersistentVolumeClaims made on top of them.

In case of NFS: A PersistentVolume (PV) is an atomic abstraction. You can not subdivide it across multiple claims. A persistent volume claim (PVC) specifies the desired access mode and storage capacity. Currently, based on only these two attributes, a PVC is bound to a single PV. Once a PV is bound to a PVC, that PV is essentially tied to the PVC’s project and cannot be bound to by another PVC. Source

In case of a cloud provided PV, seems like some cloud providers do not support ReadWriteMany (which would be required for a pv to be able to talk to many pvcs. Digitalocean for example specifically states this here: "accessModes must be set to ReadWriteOnce. The other parameters, ReadOnlyMany and ReadWriteMany, are not supported by DigitalOcean volumes.".

For k8s this will require provisioning a PV, which we hope to abstract away from the user with our Kurtosis Cloud offering that we're working on right now.

Creating any kubernetes cluster that supports ReadWriteMany might require special file system in the backend, such as Ceph or Longhorn. What kind of cloud offering do you plan to offer? Ethereum (or other blockchain products) require very low latency storage solutions, otherwise they might fall behind the chain, and would end up trying to sync to head indefinitely.

My takeaway:

As I currently understand everything inside kubernetes right now sort of runs in an EmptyDir (kubectl describe pod cl-2-lighthouse-geth-validator). Looking for a file-system on the kubernetes node itself is might be a bit more challenging, but I did find it.

Volumes:
  files-artifact-expansion:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)

One could set the emptyDir.medium field to “Memory” to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead.

The location should of emptyDir should be in /var/lib/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ [1] on the given node where your pod is running. So one an example would be for pod cl-1-lighthouse-geth-validator you can run kubectl get pod cl-1-lighthouse-geth-validator -o jsonpath='{.metadata.uid}' then use the podid to navigate to /var/lib/kubelet/pods/6ebae2bd-f5a1-457b-8736-fb6b404f87d5/volumes/kubernetes.io~empty-dir/files-artifact-expansion/0/output and find the volume mount there. This is going to be running on the medium of whatever the node has. Most likely SSD for most systems.

You can use a pod escalation to get access to the file system of the node (in case you don't already have direct access to the cluster node yourself). For example: ./kubectl-node_shell nodename.

Thanks for the explanation, and clarifications. I now have a much better understanding of the kurtosis.

0 replies

mieubrisse · 2023-07-20T14:07:16Z

mieubrisse
Jul 20, 2023
Maintainer

In case of NFS: A PersistentVolume (PV) is an atomic abstraction. You can not subdivide it across multiple claims. A persistent volume claim (PVC) specifies the desired access mode and storage capacity. Currently, based on only these two attributes, a PVC is bound to a single PV. Once a PV is bound to a PVC, that PV is essentially tied to the PVC’s project and cannot be bound to by another PVC.

Thanks; that's good to know! We'd been using DigitalOcean's default PV which I think is analogous to S3 buckets (which can be subdivided).

In case of a cloud provided PV, seems like some cloud providers do not support ReadWriteMany (which would be required for a pv to be able to talk to many pvcs. Digitalocean for example specifically states this here: "accessModes must be set to ReadWriteOnce. The other parameters, ReadOnlyMany and ReadWriteMany, are not supported by DigitalOcean volumes.".

Yep; most Kubernetes PV drivers don't support ReadWriteMany and it makes sense to me. While it's convenient for dev, the "one writable filesystem mounted in multiple places" pattern means that you're running a high risk of data corruption. We've often received the request, "why don't you let me mount a volume across many containers like Docker does?" Docker can do this because it's all local filesystem disk, but that breaks down in the k8s (and Prod) world for exactly the data corruption issue. To make Kurtosis usable on both Dev and Prod, we're trying to figure out a happy medium that allows for flexible, easy dev workflows while still being safe for Prod.

Creating any kubernetes cluster that supports ReadWriteMany might require special file system in the backend, such as Ceph or Longhorn. What kind of cloud offering do you plan to offer? Ethereum (or other blockchain products) require very low latency storage solutions, otherwise they might fall behind the chain, and would end up trying to sync to head indefinitely.

Thanks; that's a good flag on latency. Do you have a preferred PV driver that you use right now that works?

As I currently understand everything inside kubernetes right now sort of runs in an EmptyDir

Basically - all the storage for everything right now is ephemeral, though we're going to be changing this in the next 1-2 months.

One could set the emptyDir.medium field to “Memory” to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead.

Yep definitely an option, though I'm not very inclined to to do this because it requires your Nodes to have tons of memory (which is more expensive). Perhaps an advanced option that we'd allow in the medium/longterm future for extremely performance-intensive applications.

The location should of emptyDir should be in /var/lib/kubelet/pods/{podid}/volumes/kubernetes.ioempty-dir/ [1] on the given node where your pod is running. So one an example would be for pod cl-1-lighthouse-geth-validator you can run kubectl get pod cl-1-lighthouse-geth-validator -o jsonpath='{.metadata.uid}' then use the podid to navigate to /var/lib/kubelet/pods/6ebae2bd-f5a1-457b-8736-fb6b404f87d5/volumes/kubernetes.ioempty-dir/files-artifact-expansion/0/output and find the volume mount there. This is going to be running on the medium of whatever the node has. Most likely SSD for most systems.

I'd strongly recommend just using kurtosis service shell to explore the filesystem of the service, rather than trying to find out where it actually lives on the Node behind the scenes. You'll log into the container backing the service, and be able to explore it like a normal machine's filesystem!

Thanks for the explanation, and clarifications. I now have a much better understanding of the kurtosis.

Awesome; glad it helped! I'll also add this to an FAQ on our docs to help folks with similar questions :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help me understand the underlying file system of kurtosis on kubernetes #946

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Help me understand the underlying file system of kurtosis on kubernetes #946

barnabasbusa Jul 19, 2023 Maintainer

Replies: 4 comments

mieubrisse Jul 19, 2023 Maintainer

mieubrisse Jul 19, 2023 Maintainer

barnabasbusa Jul 20, 2023 Maintainer Author

My takeaway:

mieubrisse Jul 20, 2023 Maintainer

barnabasbusa
Jul 19, 2023
Maintainer

mieubrisse
Jul 19, 2023
Maintainer

mieubrisse
Jul 19, 2023
Maintainer

barnabasbusa
Jul 20, 2023
Maintainer Author

mieubrisse
Jul 20, 2023
Maintainer