Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zio: Avoid sleeping in the I/O path #16785

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

markjdb
Copy link
Contributor

@markjdb markjdb commented Nov 19, 2024

The change modifies zio_delay_interrupt(), used in testing environments, to avoid sleeping, as this is not allowed on FreeBSD.

Motivation and Context

When running the ZTS in a FreeBSD VM, I often see the following panic:

panic: sleepq_add: td 0xfffff80104a2f000 to sleep on wchan 0xffffffff813ebb21 with sleeping prohibited
cpuid = 1
time = 1732040928
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00d8d5aa30
vpanic() at vpanic+0x136/frame 0xfffffe00d8d5ab60
panic() at panic+0x43/frame 0xfffffe00d8d5abc0
sleepq_add() at sleepq_add+0x338/frame 0xfffffe00d8d5ac10
_sleep() at _sleep+0x1f3/frame 0xfffffe00d8d5acb0
pause_sbt() at pause_sbt+0x7d/frame 0xfffffe00d8d5ace0
zio_delay_interrupt() at zio_delay_interrupt+0x1f9/frame 0xfffffe00d8d5ad40
g_io_deliver() at g_io_deliver+0x2b0/frame 0xfffffe00d8d5ad90
g_disk_done() at g_disk_done+0xee/frame 0xfffffe00d8d5add0
vtblk_done_completed() at vtblk_done_completed+0x38/frame 0xfffffe00d8d5ae10
vtblk_vq_intr() at vtblk_vq_intr+0xaf/frame 0xfffffe00d8d5ae60
ithread_loop() at ithread_loop+0x266/frame 0xfffffe00d8d5aef0
fork_exit() at fork_exit+0x82/frame 0xfffffe00d8d5af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00d8d5af30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

In particular, we shouldn't assume that it's ok to sleep in I/O completion threads.

This change modifies zio_delay_interrupt() to avoid violating this invariant.

Description

The patch removes a special case in zio_delay_interrupt() such that the I/O is always deferred to a taskqueue.

How Has This Been Tested?

I've run through the ZTS on FreeBSD with this patch.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Signed-off-by: Mark Johnston <[email protected]>
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Just makes me wonder about usefulness of higher-resolution taskq_dispatch_delay() variant. FreeBSD could do it easily.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Nov 19, 2024
@markjdb
Copy link
Contributor Author

markjdb commented Nov 19, 2024

Makes sense to me. Just makes me wonder about usefulness of higher-resolution taskq_dispatch_delay() variant. FreeBSD could do it easily.

Looking at the existing callers, I don't see much demand for higher-resolution timeouts. Though, ddi_get_lbolt64() seems fairly coarse on FreeBSD, I guess the maximum resolution there is ~10us.

@behlendorf behlendorf merged commit d76d79f into openzfs:master Nov 20, 2024
23 of 24 checks passed
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 20, 2024
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 21, 2024
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by:  Alexander Motin <[email protected]>
Signed-off-by: Mark Johnston <[email protected]>
Closes openzfs#16785
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Nov 21, 2024
zio_delay_interrupt(), apparently used for fault injection, is executed
in the I/O pipeline.  It can cause the calling thread to go to sleep,
which is not allowed on FreeBSD.  This happens only for small delays,
though, and there's no apparent reason to avoid deferring to a taskqueue
in that case, as it already does otherwise.

Simply go to sleep unconditionally.  This fixes an occasional panic I
see when running the ZTS on FreeBSD.  Also remove an unhelpful comment
referencing the non-existent timeout_generic().

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by:  Alexander Motin <[email protected]>
Signed-off-by: Mark Johnston <[email protected]>
Closes openzfs#16785
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants