You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The context of notify locks the socket and puts many packets in to transmission. At a point it can't put more packets (remains in a loop) while the send buffer is not depleted, as the socket is locked.
This needs some redesign.
The change suggested by Rujun does not work as there is observed a loop that was not observed before.
The original FreeBSD code has the same here and this if treats a different case and it is not related to the reported problem.
I am able to reproduce the two issues on my side with big transfers and also the hang of webserver2.
The webserver performance is limited by this, intermittently I am able to get 260MB of traffic using one core, other times it reduces speed because of the filling of so->so_snd and synchronisation needed.
The webserver2 hang seems to be also caused by the filling of so->so_snd.
Rujun, I will continue the investigation using your observation and to work on a solution.
When I add some additional info to be printed I found that in function ofp_tcp_output in ofp_tcp_output.c, the variable snd_una is unchanged, and the snd_nxt – snd_una keeps growing, finally makes the window have no space to hold more data. I guess this maybe the problem?
Follow is part of my output:
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 27789
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 928
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28313
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 404
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28313
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 916
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28837
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 392
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28837
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 859
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I will try to get the gdb stack traces today. I’ll pass them along after I have them.
In the meantime, I’m wondering if other people are having similar problems with TCP. It seems odd that I can’t run one of the example apps for more than a few minutes. So I’m wondering if there are known problems with TCP robustness.
I’m concerned that I might be doing something incorrect that is causing the problem to occur. Is there a preferred way to start the webserver2 app regarding cmdline switches and ODP_PLATFORM_PARAMS?
Can you attach with gdb and send a call trace for the hanged thread or run through gdb the whole application and catch the application dump?
200% means that two worker thread cores are working. What is the number of cores that you start the application with?
Kind Regards,
Sorin
From: openfastpath [mailto:[email protected]] On Behalf Of Li, Rujun
Sent: Thursday, August 25, 2016 3:27 AM
To: Riggs Justin [email protected]; [email protected]
Subject: Re: [openfastpath] the ofp-tcp cannot work when send a relatively big file in my environment
I use command tcpdump and try to figure out where the problems locate. I found that when I request for a 100K file, ofp cannot send all the content in the file and finish a normal close. When it stops, no arp reply is sent by the webserver. I think maybe the network stack is dump. And then I use top command to see my CPU and memory condition, I found memory is only used 0.5% but CPU is always 200% in my 16 cpus server.
Best Wishes,
Rujun, Li
From: openfastpath [mailto:[email protected]] On Behalf Of Riggs Justin
Sent: Wednesday, August 24, 2016 7:35 PM
To: [email protected]
Subject: Re: [openfastpath] the ofp-tcp cannot work when send a relatively big file in my environment
Hi,
I’ve seen the same problem with the webserver2 program. I’ve also written a program of my own, and it shows the same problem. Small files are okay, but if you get over about 10KB they will fail. I modified my own little program so that it doesn’t use the “notify()” function to get notifications (it’s strictly driven via ofp_select()), and it still has the same problem.
Another variation on this problem is that the webserver2 will eventually stop working if I try to download a lot of small files. I can usually download about a thousand files that are around 2KB, and then it stops. The size of my hugepage space or number of mbufs affects when it stops, but it always eventually stops. So in this particular scenario it seems like it’s leaking buffers, and then it stops once they’re all gone. But that’s just a guess. I haven’t tracked it all the way down yet.
Justin Riggs
From: openfastpath [mailto:[email protected]] On Behalf Of Li, Rujun
Sent: Wednesday, August 24, 2016 1:47 AM
To: [email protected]
Subject: [++SPAM++]: [openfastpath] the ofp-tcp cannot work when send a relatively big file in my environment
Hi everyone
When I run the example webserver2 in the ofp, and use another machine to work as a client and request for a 100K txt file, it cannot correctly return and I think this manner makes the network stack dump because no arp reply anymore. But when I restart and request for a 10K file, it works correctly in the same environment. Whether there are some limits on the sizes of files? Or some other problems that I haven’t noticed? Thanks!
Note: the issue was created automatically with bugzilla2github tool
Bugzilla Bug ID: 100
Date: 2016-10-05 11:20:24 +0200
From: Sorin Vultureanu <[email protected]>
To: Sorin Vultureanu <[email protected]>
Last updated: 2016-11-22 15:48:11 +0100
Bugzilla Comment ID: 187
Date: 2016-10-05 11:20:24 +0200
From: Sorin Vultureanu <[email protected]>
The context of notify locks the socket and puts many packets in to transmission. At a point it can't put more packets (remains in a loop) while the send buffer is not depleted, as the socket is locked.
This needs some redesign.
The change suggested by Rujun does not work as there is observed a loop that was not observed before.
The original FreeBSD code has the same here and this if treats a different case and it is not related to the reported problem.
From: Li, Rujun [mailto:[email protected]]
Sent: Tuesday, August 30, 2016 11:48 AM
To: Sorin Vultureanu [email protected]; Riggs Justin [email protected]
Cc: [email protected]
Subject: RE: the ofp-tcp cannot work when send a relatively big file in my environment
Hi Sorin
I found that the if condition in ofp_tcp_output.c in line around 344
len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off)
may cause the problem. When the so_snd.sb_cc is larger than sendwin, the len won’t be changed and cause
len >= tp->t_maxseg always be false.
When I change it into
len= (long)so->so_snd.sb_cc – off
and set the SOCKBUF_LEN larger, I can request for 100K files successfully.
I wish this may help!
Best Wishes,
Rujun, Li
From: Sorin Vultureanu [mailto:[email protected]]
Sent: Tuesday, August 30, 2016 3:19 PM
To: Li, Rujun [email protected]; Riggs Justin [email protected]
Cc: [email protected]
Subject: RE: the ofp-tcp cannot work when send a relatively big file in my environment
Hi Rujun, Justin,
Thank you for feedback and support on this issue.
I am able to reproduce the two issues on my side with big transfers and also the hang of webserver2.
The webserver performance is limited by this, intermittently I am able to get 260MB of traffic using one core, other times it reduces speed because of the filling of so->so_snd and synchronisation needed.
The webserver2 hang seems to be also caused by the filling of so->so_snd.
Rujun, I will continue the investigation using your observation and to work on a solution.
Kind Regards,
Sorin
From: Li, Rujun [mailto:[email protected]]
Sent: Tuesday, August 30, 2016 4:08 AM
To: Riggs Justin [email protected]; Sorin Vultureanu [email protected]
Cc: [email protected]
Subject: RE: the ofp-tcp cannot work when send a relatively big file in my environment
Hi
When I add some additional info to be printed I found that in function ofp_tcp_output in ofp_tcp_output.c, the variable snd_una is unchanged, and the snd_nxt – snd_una keeps growing, finally makes the window have no space to hold more data. I guess this maybe the problem?
Follow is part of my output:
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 27789
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 928
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28313
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 404
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28313
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 916
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28837
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 392
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 28837
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 859
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
I 6313 68:2031253248 ofp_tcp_output.c:345] the buffer or window size we can use is 335
I 6313 68:2031253248 ofp_tcp_output.c:218] snd_una = 2119005760
I 6313 68:2031253248 ofp_tcp_output.c:219] snd_nxt-snd_una = 29361
Best Wishes,
Rujun, Li
From: Riggs Justin [mailto:[email protected]]
Sent: Tuesday, August 30, 2016 1:02 AM
To: Sorin Vultureanu [email protected]
Cc: [email protected]; Li, Rujun [email protected]
Subject: RE: the ofp-tcp cannot work when send a relatively big file in my environment
Hi Sorin,
I will try to get the gdb stack traces today. I’ll pass them along after I have them.
In the meantime, I’m wondering if other people are having similar problems with TCP. It seems odd that I can’t run one of the example apps for more than a few minutes. So I’m wondering if there are known problems with TCP robustness.
I’m concerned that I might be doing something incorrect that is causing the problem to occur. Is there a preferred way to start the webserver2 app regarding cmdline switches and ODP_PLATFORM_PARAMS?
Thanks,
Justin Riggs
From: Sorin Vultureanu [mailto:[email protected]]
Sent: Thursday, August 25, 2016 2:16 AM
To: Riggs Justin [email protected]
Cc: [email protected]; Li, Rujun [email protected]
Subject: RE: the ofp-tcp cannot work when send a relatively big file in my environment
Hi Justin,
Can you attach with gdb and send a call trace for the hanged thread or run through gdb the whole application and catch the application dump?
200% means that two worker thread cores are working. What is the number of cores that you start the application with?
Kind Regards,
Sorin
From: openfastpath [mailto:[email protected]] On Behalf Of Li, Rujun
Sent: Thursday, August 25, 2016 3:27 AM
To: Riggs Justin [email protected]; [email protected]
Subject: Re: [openfastpath] the ofp-tcp cannot work when send a relatively big file in my environment
I use command tcpdump and try to figure out where the problems locate. I found that when I request for a 100K file, ofp cannot send all the content in the file and finish a normal close. When it stops, no arp reply is sent by the webserver. I think maybe the network stack is dump. And then I use top command to see my CPU and memory condition, I found memory is only used 0.5% but CPU is always 200% in my 16 cpus server.
Best Wishes,
Rujun, Li
From: openfastpath [mailto:[email protected]] On Behalf Of Riggs Justin
Sent: Wednesday, August 24, 2016 7:35 PM
To: [email protected]
Subject: Re: [openfastpath] the ofp-tcp cannot work when send a relatively big file in my environment
Hi,
I’ve seen the same problem with the webserver2 program. I’ve also written a program of my own, and it shows the same problem. Small files are okay, but if you get over about 10KB they will fail. I modified my own little program so that it doesn’t use the “notify()” function to get notifications (it’s strictly driven via ofp_select()), and it still has the same problem.
Another variation on this problem is that the webserver2 will eventually stop working if I try to download a lot of small files. I can usually download about a thousand files that are around 2KB, and then it stops. The size of my hugepage space or number of mbufs affects when it stops, but it always eventually stops. So in this particular scenario it seems like it’s leaking buffers, and then it stops once they’re all gone. But that’s just a guess. I haven’t tracked it all the way down yet.
Justin Riggs
From: openfastpath [mailto:[email protected]] On Behalf Of Li, Rujun
Sent: Wednesday, August 24, 2016 1:47 AM
To: [email protected]
Subject: [++SPAM++]: [openfastpath] the ofp-tcp cannot work when send a relatively big file in my environment
Hi everyone
When I run the example webserver2 in the ofp, and use another machine to work as a client and request for a 100K txt file, it cannot correctly return and I think this manner makes the network stack dump because no arp reply anymore. But when I restart and request for a 10K file, it works correctly in the same environment. Whether there are some limits on the sizes of files? Or some other problems that I haven’t noticed? Thanks!
My test command is wget http://192.168.77.76:6789/index_100.txt as i set the ip and change the port.
Best Wishes,
Rujun, Li
The text was updated successfully, but these errors were encountered: