Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recovery elides too greedily and too deep; awakens eventfd #171

Merged
merged 1 commit into from
Aug 19, 2024
Merged

Conversation

jclulow
Copy link
Collaborator

@jclulow jclulow commented Aug 19, 2024

tokio 1.39 and above use eventfd(7) instead of the more traditional self-pipe mechanism for injecting wake-ups into the event loop. We are, regrettably, not including the eventfd.conf configuration file for the eventfd kernel module in the recovery image. This causes the installinator to fail to start:

BRM42220062 # svcs -a | grep oxide
online          0:00:34 svc:/oxide/installinator:default
BRM42220062 # tail -F $(svcs -L installinator)
[ Dec 28 00:00:02 Enabled. ]
[ Dec 28 00:00:34 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/installinator/installinator install --bootstrap-sled --from-ipcc --install-on-gimlet --stay-alive &"). ]
[ Dec 28 00:00:34 Method "start" exited with status 0. ]
thread 'main' panicked at installinator/src/main.rs:15:7:
Failed building the Runtime: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Interestingly this didn't cause the service to fail, or even to retry... It seems like the service uses the "transient" duration, which may be another (separate) bug.

@hawkw
Copy link
Member

hawkw commented Aug 19, 2024

This should (hopefully) fix oxidecomputer/omicron#6391.

Copy link
Contributor

@rmustacc rmustacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dark event will not avail you, flame of Udûn.

@jclulow jclulow merged commit 5957f05 into master Aug 19, 2024
sunshowers added a commit to oxidecomputer/omicron that referenced this pull request Aug 20, 2024
Tokio 1.39 updated its mio dependency to 1.0, which changed the waker impl on illumos from a self-pipe to eventfd. That has caused several issues already:

* oxidecomputer/helios#169
* oxidecomputer/helios#171

Based on these and the potential for other lurking issues, we're making a policy decision to roll back to 1.38 (mio 0.8) for r10. We can't be off of the train forever so we're aiming to land the 1.39 update early in the r11 cycle.

This backs out commit d7d4bea.
sunshowers added a commit to oxidecomputer/omicron that referenced this pull request Aug 27, 2024
Tokio 1.39/mio 1.0 switches out the illumos impl to being eventfd based. For release 10 we decided that that was too risky, so we switched back to Tokio 1.38.

Now that the r10 branch has been cut, we can go back and update Tokio to 1.39.3. We'd like to land this early in the cycle to get as much soak time as possible.

See:

* #6356
* #6249
* oxidecomputer/helios#169
* oxidecomputer/helios#171
* #6391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants