-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tests] gettimeofday syscall test hangs with ETHREADS=1 #380
Comments
@SeanTAllen I thought #334 is fixed with your PR #372 |
@SeanTAllen this reproes with run-sw more easily. |
That error message no longer exists. If you are still getting an error related to next_delay_ns then you aren't running the most recent code. |
@SeanTAllen I cleaned and rebuilt with latest code. Yes I don't get this error now but it makes gettimeofday system call infinitely, never stops. I ran command: It makes this system call infinitely: (This line repeats millions time) |
I'm able to reproduce locally. |
Interesting. The basic/clock test that also calls gettimeofday does not hang. |
@SeanTAllen yes it only fails with ETHREADS=1. It was failing in sgx-lkl nightly pipeline where we run tests with different ETHREADS values like 1, 4, 8. This test was always failing. |
How could CI be passing if this has always failed for all different numbers of ethreads? I think I'm misunderstanding you. |
@SeanTAllen normal sgx-lkl pipeline runs with ETHREADS=cpu core count (which is 4 or 8) In nightly sgx-lkl build we run with different ETHREADS (1, 4, 8) and it fails for only ETHREADS=1 |
Gotcha |
(@hukoyu The normal pipeline uses ETHREADS=8 for everything. See also the default in template.yml.) |
So the problem here is with the cooperative scheduler. The test in question will never yield the CPU. It calls gettimeofday repeatedly and there's nothing in the existing LKL codebase that will cause it to yield. It works with more than 1 ETHREAD because, another core is available that will receive the signal from alarm and terminate the test. I'm looking at how lthreads might occassionally yield to allow for other jobs to be done. It might take a while to come up with a satisfactory solution. |
I'm marking this as p2 since using a single ethread is an edge case which likely won't ever be used in deployments. |
I agree that single ethread can be considered an edge case. I do think that the underlying issue needs to be addressed as it can reappear with more ethreads and the correct program. Unless a thread in a user program calls sleep or similar, it won't yield the scheduler so if there are more of said user threads than ethreads, progress will probably halt. We need additional cooperation points in the scheduler to make the cooperative scheduling better especially as the Linux programs that we are attempting to run were written with a preemptive scheduler in mind. |
There's a tester application for this on the |
Closing as a duplicate of #209. |
This test fails in sgx-lkl nightly pipeline. In the pipeline it fails since test harness times out after 5 minutes. But when run locally it hangs forever. gettimeofday is called infinitely
Command:
SGXLKL_ETHREADS=1 make run-hw-single-gdb test=/ltp/testcases/kernel/syscalls/gettimeofday/gettimeofday02
Output:
The text was updated successfully, but these errors were encountered: