Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send all flushes through the deferred jobs queue (if present) #1575

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Nov 26, 2024

Unclear if this actually matters, but I'm worried that flushes and barriers can skip the deferred job queue.

@mkeeter mkeeter requested review from jmpesp and leftwo November 26, 2024 21:49
Base automatically changed from mkeeter/deactivate-need-flush to main November 27, 2024 15:59
Copy link
Contributor

@jmpesp jmpesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear if this actually matters, but I'm worried that flushes and barriers can skip the deferred job queue.

I don't think I understand the concern - my mental model of the deferred job queue is that it offloads the encryption and decryption from the main loop?

} else {
info!(self.log, "not ready to deactivate client {i}");
if !self.deferred_ops.is_empty() {
info!(self.log, "waiting for deferred ops...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a behaviour change, no?

I think we should allow the guest to deactivate at any time. I imagine in a scenario where the Upstairs / Propolis is stuck, we would still want to shut down or migrate a instance, and if Propolis calls deactivate that could block those.

@@ -1278,10 +1294,12 @@ impl Upstairs {
}
UpstairsState::Active => (),
}
if !self.downstairs.can_deactivate_immediately() {
if self.need_flush.is_some()
|| !self.deferred_ops.is_empty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here

@mkeeter
Copy link
Contributor Author

mkeeter commented Dec 12, 2024

My concern is the following sequence of events:

  • The guest submits a read, which is sent to the Downstairs.
  • The guest submits a large write. It is moved to the deferred queue for processing
  • The guest submit deactivate. In the code on main:
    • This checks can_deactivate_immediately, which returns false (because the read is pending)
    • Then, it calls submit_flush, which submits a flush immediately
  • The read returns from each Downstairs
  • That flush returns from eachDownstairs
    • The Upstairs calls try_deactivate on each Downstairs, which returns true
    • The Upstairs deactivates
  • The deferred write encryption finishes
  • The Upstairs tries to send the deferred write to the Downstairs, and panics because we're in an invalid state

However, after trying to build a unit test for this, I couldn't trigger any issues:

  • The deactivate message from the Guest is deferred if submitted after the write, so that implicitly puts an order on the final flush
  • If instead we submit the deactivation immediately before the write, then the Upstairs is marked as Deactivating, and so the write returns with an error immediately

In other words, this is safe because the deactivation request goes through the same queued as deferred writes, even though the final flush does not go through that queue.

Automatically generated flushes and barriers are still unqueued, so I need to think more about whether this could cause problems, but my immediate concerns are lessened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants