You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using nginx 1.18.0 with njs 0.6.0 on a high traffic server, it can be spotted some of the nginx worker processes are utilizing 100% of cpu time.
Using lsof, tcpdump, and strace, it can be verified that the data stream is running through. Using perf to get samples from the process in question and analyze hot spot, I can find that it takes the worker 99.98% of cpu time to reach the end of the linked list *busy in https://github.com/nginx/nginx/blob/release-1.18.0/src/core/ngx_buf.c#L195 .
I analyze the core image of the worker process and confirmed that the *busy chain is really really long, piled up with buffers with the tag ngx_stream_proxy_module. Some of the chain objects are referring the same buf object, and some of the buf objects are within the range of session->upstream->upstream_buf and the others session->upstream-> downstream_buf.
I believe this issue is because in njs_stream module, buf objects from both directions are mixed and appended into the same *busy chain. Suppose ngx_stream_js_body_filter is handling a chain of one buf object from upstream and a chain of two buf objects from downstream and the first buf is not sent entirely and moved to *busy chain. If the two following buf objects from downstream are fully sent, they will still be appended to the *busy chain, and not moved to the *free chain because there is one chain containing the first buf, which is not “empty”. However, when the control flow returns to ngx_stream_proxy_process, things become different: busy chain is separate for the two direction and the two buf objects will be put into u->free and be reused. Considering u->free chain is last-in-first-out, these objects will be soon reused. Remember now there are still references from the *busy chain in ngx_stream_js_body_filter to these objects. The chain objects there may have no chance to be freed and begin to pile up.
The root problem is that modules in the chain of buf processing should have the same view whether the buf object referred by a chain is busy or not. Since the *busy chain is separated in ngx_stream_proxy_module, we should also separate them in ngx_stream_js_module.
Since applying the following patch on our production environment, the symptom has not happened again.
I have not find a way to reliably reproduce the problem in a test environment. May be the condition is assigning an empty js function to js_filter, which does not install any event handlers, so buf objects from both directions can reach the else clause of if(event->ev != NULL) in ngx_stream_js_body_filter, where those buf objects are directly referred by created chain objects which are then appended into out and handled by ngx_stream_top_filter and then by ngx_chain_update_chains.
When using nginx 1.18.0 with njs 0.6.0 on a high traffic server, it can be spotted some of the nginx worker processes are utilizing 100% of cpu time.
Using
lsof
,tcpdump
, andstrace
, it can be verified that the data stream is running through. Usingperf
to get samples from the process in question and analyze hot spot, I can find that it takes the worker 99.98% of cpu time to reach the end of the linked list*busy
in https://github.com/nginx/nginx/blob/release-1.18.0/src/core/ngx_buf.c#L195 .I analyze the core image of the worker process and confirmed that the
*busy
chain is really really long, piled up with buffers with the tagngx_stream_proxy_module
. Some of thechain
objects are referring the samebuf
object, and some of thebuf
objects are within the range ofsession->upstream->upstream_buf
and the otherssession->upstream-> downstream_buf
.I believe this issue is because in
njs_stream
module,buf
objects from both directions are mixed and appended into the same*busy
chain. Supposengx_stream_js_body_filter
is handling a chain of onebuf
object from upstream and a chain of twobuf
objects from downstream and the firstbuf
is not sent entirely and moved to*busy
chain. If the two followingbuf
objects from downstream are fully sent, they will still be appended to the*busy
chain, and not moved to the*free
chain because there is one chain containing the firstbuf
, which is not “empty”. However, when the control flow returns tongx_stream_proxy_process
, things become different:busy
chain is separate for the two direction and the twobuf
objects will be put intou->free
and be reused. Consideringu->free
chain is last-in-first-out, these objects will be soon reused. Remember now there are still references from the*busy
chain inngx_stream_js_body_filter
to these objects. Thechain
objects there may have no chance to be freed and begin to pile up.The root problem is that modules in the chain of
buf
processing should have the same view whether thebuf
object referred by achain
is busy or not. Since the*busy
chain is separated inngx_stream_proxy_module
, we should also separate them inngx_stream_js_module
.Since applying the following patch on our production environment, the symptom has not happened again.
The text was updated successfully, but these errors were encountered: