Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installinator is broken on mio v1.0.2 due to there not being eventfds #6391

Closed
hawkw opened this issue Aug 19, 2024 · 5 comments
Closed

Installinator is broken on mio v1.0.2 due to there not being eventfds #6391

hawkw opened this issue Aug 19, 2024 · 5 comments

Comments

@hawkw
Copy link
Member

hawkw commented Aug 19, 2024

A recent Omicron TUF repo (from commit 58ae0d1) can no longer be installed on london. Instead, the mupdate process hangs at "Downloading Installinator and waiting for it to start", although the Wicket UI indicates that the installinator image is 100% downloaded:

image
image
image

When tailing logs from Installinator on an effected sled, we see that the Installinator binary has failed with the error "Failed building the Runtime" due to an OS "No such file or directory" error.

BRM42220062 # svcs -a | grep oxide
online          0:00:34 svc:/oxide/installinator:default
BRM42220062 # tail -F $(svcs -L installinator)
[ Dec 28 00:00:02 Enabled. ]
[ Dec 28 00:00:34 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/installinator/installinator install --bootstrap-sled --from-ipcc --install-on-gimlet --stay-alive &"). ]
[ Dec 28 00:00:34 Method "start" exited with status 0. ]
thread 'main' panicked at installinator/src/main.rs:15:7:
Failed building the Runtime: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This error is coming not from our code, but from Tokio (specifically, the #[tokio::main] macro calling into tokio::runtime::Builder::build(). It appears that we can no longer start a Tokio runtime in the installinator binary.

This seems to have resulted from updating to mio v1.0.2. PR tokio-rs/mio#1826 changed illumos builds of mio to use the eventfd-based waker implementation, rather than the pipe-based waker. But, the installinator trampoline image does not have eventfd1:

BRM42220062 # find /kernel /usr/kernel -name eventfd'*'
/usr/kernel/drv/amd64/eventfd

jclulow@atrium ~
 $ find /kernel /usr/kernel -name eventfd'*'
/usr/kernel/drv/amd64/eventfd
/usr/kernel/drv/eventfd.conf

This does not mean that any Rust code using mio v1.0.2+ is broken on Helios/illumos; this specifically effects the recovery/trampoline images, because they don't have eventfd. Normal illumos should be able to run mio v1.0.2 without encountering this problem.

Important

Critically, this means that any Omicron TUF repo built since d7d4bea is, for all intents and purposes, non-viable. You can't install it on any system. Don't try to.

Footnotes

  1. Per @rmustacc, "The eventfd.conf is in the elide list for recovery, but not the driver. However, because it's a pseudo-device driver, missing that will mean that it won't get created."

@hawkw hawkw changed the title mio 1.0.2 has broken Installinator due to there not being eventfds Installinator is broken on mio v1.0.2 due to there not being eventfds Aug 19, 2024
@rmustacc
Copy link

To clarify what's going wrong here, eventfd() returns a file descriptor. The way that that file descriptor is created in the illumos system is that it leverages a psuedo-device driver that opens a /dev/eventfd. For a psuedo-device driver to be sucessfully instantiated, the system relies on the <driver>.conf file that tells it who the parent is and how many instances to create. This is different for psuedo-device drivers as there is no self-identifying way to create them, therefore the .conf file is much more load bearing than say for a NIC driver where instances will automatically be created based on the associated open firmware/device tree style compatible property and driver_aliases file.

@sunshowers
Copy link
Contributor

Thanks for jumping on this! As the person responsible for landing Tokio 1.39.2 in omicron, I've added I've added a topic to tomorrow's control plane sync about considering backing it out for r10. (And reintroducing it in r11 so we get more dogfooding time.)

@hawkw
Copy link
Member Author

hawkw commented Aug 19, 2024

I'm presently attempting to mupdate london with a TUF repo that should include @jclulow's change from oxidecomputer/helios#171; will be able to report back as to whether or not Installinator is un-broken by it (I suspect it will be, but you never know). That doesn't necessarily mean we shouldn't also back out the dependency bump, but we'll shortly know if it works on the recovery image or not.

@sunshowers
Copy link
Contributor

sunshowers commented Aug 21, 2024

@hawkw is this resolved? Sounded from yesterday's meeting like helios#171 did the job. And we've backed out the Tokio update for r10 anyway.

@hawkw
Copy link
Member Author

hawkw commented Aug 21, 2024

Yup, oxidecomputer/helios#171 fixed this!

@hawkw hawkw closed this as completed Aug 21, 2024
sunshowers added a commit that referenced this issue Aug 27, 2024
Tokio 1.39/mio 1.0 switches out the illumos impl to being eventfd based. For release 10 we decided that that was too risky, so we switched back to Tokio 1.38.

Now that the r10 branch has been cut, we can go back and update Tokio to 1.39.3. We'd like to land this early in the cycle to get as much soak time as possible.

See:

* #6356
* #6249
* oxidecomputer/helios#169
* oxidecomputer/helios#171
* #6391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants