Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async signals aren't always delivered as expected #209

Open
SeanTAllen opened this issue May 6, 2020 · 2 comments
Open

Async signals aren't always delivered as expected #209

SeanTAllen opened this issue May 6, 2020 · 2 comments
Assignees
Labels
area: sgx-lkl Core SGX-LKL functionality bug p0 Blocking priority
Milestone

Comments

@SeanTAllen
Copy link
Contributor

Asynchronous signals such as those that will be delivered by using the system calls alarm and setitimer are not delivered as expected.

This can result in unexpected behavior from applications, for example, busybox ping. It will use alarm to wait to send it's ping and during that time isn't executing system calls so it hangs in sgx-lkl after the first ping is sent.

@letmaik first reported this issue. It can be recreated by changing the CC_APP and CC_APP_CMDLINE options in the curl test to:

CC_APP=/bin/ping
CC_APP_CMDLINE=${CC_APP} -c 2 google.com

Attached is a zip file with 3 different variations on how ping could be written that can help diagnosing. Those that rely on a timer will fail unless the timer is less than the amount of time it takes to receive a packet back from the pinged host. If a packet is received after the timer timeout, then another ping will be sent. Otherwise it hangs.

This behavior might be related to

https://github.com/lsds/lkl/blob/21cd7e192cf2a97ea69230b9c0c34da15d157579/arch/lkl/kernel/syscalls.c#L195

where it appears that signals are only handled on syscalls. However, there is more investigation to be done there.
ping-examples.zip

My suspicion is that if we checked for signals in futex_tick or somewhere in the main scheduler loop, that this issue would go away but that hasn't been verified and that might be addressing a symptom rather than underlying cause. More investigation is needed.

@SeanTAllen SeanTAllen self-assigned this May 6, 2020
@SeanTAllen SeanTAllen added area: sgx-lkl Core SGX-LKL functionality bug p1 Medium priority labels May 6, 2020
davidchisnall added a commit that referenced this issue May 26, 2020
We currently have two implementations of futexes, one provided by Linux
(which was not being built) and one in `src/enclave`.  The `src/enclave`
version was incomplete (for example, not implementing `FUTEX_WAKE_OP`)
and was under-tested.  After this change:

 - The `enclave_futex` implementation is used only for the LKL host ops.
 - The Linux futex is used for everything in userspace.

This should now give us parity with Linux for futex operations (and
ensure that we pick up new ones when they are added upstream, for
example the proposed operation to wait on multiple futexes).

This does not fix #209, but hopefully simplifies the fix because there
should now be only one way for a userspace thread to block.  It appears
as if the thread is blocking correctly in userspace and the scheduler is
receiving ticks, but is not waking up the blocking thread.

Fixes #180
Fixes #305
SeanTAllen pushed a commit that referenced this issue May 26, 2020
We currently have two implementations of futexes, one provided by Linux
(which was not being built) and one in `src/enclave`.  The `src/enclave`
version was incomplete (for example, not implementing `FUTEX_WAKE_OP`)
and was under-tested.  After this change:

 - The `enclave_futex` implementation is used only for the LKL host ops.
 - The Linux futex is used for everything in userspace.

This should now give us parity with Linux for futex operations (and
ensure that we pick up new ones when they are added upstream, for
example the proposed operation to wait on multiple futexes).

This does not fix #209, but hopefully simplifies the fix because there
should now be only one way for a userspace thread to block.  It appears
as if the thread is blocking correctly in userspace and the scheduler is
receiving ticks, but is not waking up the blocking thread.

Fixes #180
Fixes #305
@prp prp added this to the Milestone 1 milestone Jun 25, 2020
@davidchisnall davidchisnall added the needs-triage Bug does not yet have a priority assigned label Jul 28, 2020
@SeanTAllen
Copy link
Contributor Author

We have another case that appears to be the same in #673. We've closed #673 at this time as a duplicate. Relevant information from #673 is that SIGALRM is never delivered as part of a test to verify that sigsuspend() is working. Additional details are available in #673.

@SeanTAllen SeanTAllen added p0 Blocking priority and removed p1 Medium priority labels Jul 29, 2020
@SeanTAllen SeanTAllen removed their assignment Jul 29, 2020
@SeanTAllen SeanTAllen removed the needs-triage Bug does not yet have a priority assigned label Jul 29, 2020
@davidchisnall
Copy link
Contributor

Also part of #709

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sgx-lkl Core SGX-LKL functionality bug p0 Blocking priority
Projects
None yet
Development

No branches or pull requests

4 participants