Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The iperf3 server hangs while printing the report #1735

Open
RizziMau opened this issue Jul 18, 2024 · 8 comments
Open

The iperf3 server hangs while printing the report #1735

RizziMau opened this issue Jul 18, 2024 · 8 comments

Comments

@RizziMau
Copy link

Context

I am using iperf3 in a product that tests the UDP data throughput of mobile networks.
Sometimes the iperf3 server hangs while printing the report, and all subsequent tests fail with the message
iperf3: error - the server is busy running a test. Try again later.

The only solution I have found is to kill and restart the iperf3 server.

  • Version of iperf3: 3.16

  • Hardware:
    on the iperf client side: Samsung A52s
    on the iperf server side: Server with Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

  • Operating system:
    on the iperf client side: Android 11
    on the iperf server side: Ubuntu 22.04.4 LTS

Bug Report

  • Expected Behavior
    The server should not hang

  • Actual Behavior
    The server sometimes hangs while printing the report, and all subsequent tests fail with the message
    iperf3: error - the server is busy running a test. Try again later.

The issue is not systematic but occurs after several hours.
When the iperf3 server hangs:

  • the last transfer was in reverse mode (server-to-client direction)
  • the transfer was completed without errors on the client side
  • the client has received the report from the server and printed it
  • the server hangs while printing the report

The only solution I have found is to kill and restart the iperf3 server.

  • Steps to Reproduce
    Execute an iperf test (from client to server) and an iperf reverse test (from server to client) every 10 minutes.

The iperf3 server is started with the following command:
iperf3 --server --interval 0 -p 5202 -1
Note that the "-1" option causes iperf3 to exit after one transfer, but a daemon restarts it after 2 seconds.

The iperf3 client command (on the Android phone) is:
iperf3 --forceflush -c x.x.x.x -V -p 5202 -u -t 15 -i 5 -fK -4 -b 5000000 -l 1200 -P 4 -O 0
(x.x.x.x is the IP address of the server)

I've attached the logs of iperf3 server executed with the --debug=3 option:

iperf3server_debug3_ok.log reports the correct behaviour:
at timestamp 06:54:11 iperf3 prints the report and exits, according to the -1 option
iperf3server_debug3_ok.log

iperf3server_debug3_blocked.log reports the wrong behaviour:
at timestamp 06:42:15 iperf3 prints the report but it does not exit
at timestamp 06:52:23 iperf3 has been killed and it traces "iperf3: interrupt - the server has terminated"
iperf3server_debug3_blocked.log

@davidBar-On
Copy link
Contributor

davidBar-On commented Jul 18, 2024

@RizziMau, it may be that the problem is when the server is waiting for the "Done" from the client. However, since the client already terminated, the control socket should not be available already between the processes and the server should have failed / timeout. Therefore, it is not clear why the server did not end.

In any case, few questions:

  1. Just to make sure. The failed log is reverse test (-R) and the success log non-reverse. Is this because that all the reverse tests have this error?
  2. Again, to make sure, is the client version also 3.16?
  3. Can you recreate a failed log (using --debug=3), but now try running the client few time when the server is stuck (before terminating it). When the client display the the server is busy running a test message, the server should display successfully sent ACCESS_DENIED to an unsolicited connection request during active test? Does the server display thiese messages? (Running the client few times is to make sure that some of the server messages will be displayed, as they are not flushed.
  4. When the test is running there should be 5 server threads - the main thread and one for each of the 4 streams. When the server got stuck:
    • Are all the 5 threads still running?
    • How much CPU each of them consumes?
  5. Are you able to build iperf3 executable for the server (Linux)? I am asking in case it will be useful to create special versions that may help the analysis (usually with more debug messages).

@davidBar-On
Copy link
Contributor

@RizziMau, I believe I understand what is the problem ("DONE" state sent by the client is not received by the server and the server is waiting for it forever). I have a proposed fix for the problem, using the --rcv-timeout value as the timeout for waiting. Before I submit a PR for the fix, it would be very helpful if you can test at least the server side of it, to confirm that indeed this is a fix for the problem.

Can build and run iperf3 for the server (at least) from branch "issue-1735-timeout-select-when-not-in-running-state" in "https://github.com/davidBar-On/iperf.git" (git clone https://github.com/davidBar-On/iperf.git -b issue-1735-timeout-select-when-not-in-running-state)?

@RizziMau
Copy link
Author

@davidBar-On, I'm testing your iperf3 version, I will keep you update.

Just a remark:
for this issue it's better to use the --rcv-timeout parameter, or it's better to add a new parameter, e.g. --done-timeout or --close-timeout?

Iperf3 has different timeouts for different aspects:

  • --idle-timeout to restart iperf server when in idle
  • --connect-timeout for establishing the control connection
  • --rcv-timeout for data reception during data transfers

I think it would be preferable to use a specific timeout parameter for the closure of the control connection.

@davidBar-On
Copy link
Contributor

@RizziMau, thanks for testing the change. It would help validating it in general (and of course finding if it solves your issue).

Regarding the use of --rcv-timeout. The iperf3 teams does not like to add new options. Therefore, for having a better chance that the change will be merged into mainline, I try to reuse existing options when possible.

Currently the --rcv-timeout value is used only when test data is sent (TEST_RUNNING state). Usually, its value is probably related to the length of "network stuck" periods (except for very low test bandwidth, e.g. sending a packet once in every 10 seconds). Therefore, I believe that this value can also be used for the timeout of receiving control messages when test data is not sent.

Note that regardless of the option name, the change implements timeout for most of the state-change control messages. I believe it solves a general issue in iperf3 that server/client are getting stuck if a control message is not received (your issue is one example).

@RizziMau
Copy link
Author

RizziMau commented Sep 9, 2024

@davidBar-On, I've been using your version (from branch "issue-1735-timeout-select-when-not-in-running-state" in "https://github.com/davidBar-On/iperf.git") since July and I haven't encountered any issues.
Note that I am using that version only in server mode and with UDP transfers (at the moment, I have no way to test it in other modes, for example in client mode or with TCP transfers).
I hope the fix can soon be released in an official version.
Thank you.

@davidBar-On
Copy link
Contributor

@RizziMau , I submitted the change as PR #1764. Hopefully it will be merged into the main branch.

@RizziMau
Copy link
Author

@davidBar-On , I'm still using your iperf3 version (from branch "issue-1735-timeout-select-when-not-in-running-state" in https://github.com/davidBar-On/iperf.git) and sometimes I encounter segmentation faults.
These are rare cases; normally, everything works fine.

In my setup:

  • your iperf3 is running in server mode on a Ubuntu 22.04.4 LTS server
  • the official iperf3 version 3.16 is running in client mode on Samsung A52s Android 11 phones connected to a 4G network

Segmentation faults occur on the iperf3 server side during the connection setup from the client (in UDP reverse with 4 parallel data streams), as you can see in the attached log:
iperf3_segfault.txt
In these cases the iperf3 client exits reporting the message iperf3: error - unable to read from stream socket: Try again.

Could this occasional problem be due to your version of iperf3?

@davidBar-On
Copy link
Contributor

@RizziMau, I am not sure what the problem is, but since the segmentation fault happens at the start of the test, it may be because of the issue that is fixed by PR #1801 (trying to cancel non-existing thread). I am not sure about that, since the client should also fail (unless the problem is related to a change between 3.16 and 3.17.1).

In any case, I merged PR #1801 changes into branch "issue-1735-timeout-select-when-not-in-running-state". Can you build and run using the new code version? This will not fix the the test failure, but at least instead of SEGV there may be a useful error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants