-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak under heavy load #545
Comments
If I add a line inside the let server http_port =
Conduit_lwt_server.set_max_active 1_000;
... |
I take back my comment that the leak isn't eliminated when |
I observe a memory leak with cohttp 1.0.2 using mirage-cohttp and conduit 1.0.3 (mirage-conduit 3.0.1) and OCaml 4.06.0. Furthermore, I observe failures such as:
which seems to originate from cohttp-lwt/server.ml in |
I doubt this memory leak occurs in lwt or it is indeed not memory leak but an implementation strategy. The minimal zero-dependency http server below can also reproduce similar 'problem'. after system and env: debian 11 amd64, 8 core, 16g ram, ocaml 4.14.0, lwt 5.5.0 dune (executable
(public_name hello)
(libraries lwt lwt.unix)
(preprocess (pps lwt_ppx))) hello.ml open Lwt
let rec gc ()=
Lwt_unix.sleep 5.;%lwt
print_endline "full major compact";
Gc.compact ();
gc ()
let rec read_request ic=
(* read and drop the http Request-Line, all the Request Headers and the last CRLF *)
let%lwt s= Lwt_io.read_line ic in
let len= String.length s in
if len > 0 then read_request ic
else return ()
let handler _addr (ic, oc)=
let msg= "hello" in
(try%lwt read_request ic with _-> return ());%lwt
Lwt_io.fprint oc "HTTP/1.0 200 OK\r\n";%lwt
Lwt_io.fprintf oc "Content-Length:%d\r\n" (String.length msg);%lwt
Lwt_io.fprint oc "Content-Type:text/html\r\n";%lwt
Lwt_io.fprint oc "\r\n";%lwt
Lwt_io.fprint oc msg;%lwt
Lwt_io.flush oc;%lwt
return ()
let main ()=
async gc;
let sockaddr= Lwt_unix.ADDR_INET (Unix.inet_addr_any, 8000) in
let%lwt server= Lwt_io.establish_server_with_client_address
~no_close:false (* channels and socks are closed automatically after the handler, fd/channel leak is not possible *)
~backlog:4096 (* enlarge listen backlog to reduce the probability of failure connection *)
sockaddr
handler
in
let%lwt _= Lwt_io.read_line Lwt_io.stdin in
Lwt_io.shutdown_server server
let () =
Lwt_main.run @@ main () The number of requests doesn't affect the memory occupy, My guess is that lwt holds promises and channel buffers in some data structure, and for performance reason, the data structure enlarges itself when on demand, but it doesn't shrink after the promises are resolved. |
Occasionally, full compact gc can't recycle enough memory, after |
I tried to rewrite the server with lwt_unix, that is, without lwt_io, the memory consumption decreased very well. lwt_io uses lwt_bytes.t as its buffer, which is of type bigarray.Array1.t and depends on custom c stubs heavily. lwt_bytes and its c stub looks like one of the memory leak holes. After some testings, replacing of some of the components in lwt, there seems to be more memory leaking holes exist in the unix/io related modules. |
Lwt_bytes are fine, you're probably just encountering Bigarray related GC issues which are well documented. The issue is indeed with Lwt_io, but it's not related to Lwt_bytes. Lwt_io has this queuing layer for "atomic" operation (see Lwt_io.primitive) that sometimes works very poorly in practice. Taking your handler as an example, each write will enqueue itself and wait its turn until the channel is "Idle". All of this queueing overhead is quite costly (especially if we consider cancellation) and just mercilessly stresses the GC. Especially given that most write operations should be extremely cheap blits to the internal buffer. You can change your server to use |
Thanks for the explanation. Unfortunately, the performance after changing the server to use Lwt_io.direct_access is not as good as expected. |
Hi, @rgrinberg
Indeed, this strategy stacks up all the buffer together. When thousands of connections are emitted, it will cause really high peak memory usage. And the default memory allocator, malloc of glibc on linux, doesn't release free memory back to the OS. I make a binding function, <malloc.h> malloc_trim in lwt_unix, call it after every major gc cycle(Gc.create_alarm) to force the glibc memory allocator to release back free memory. Then the long-term memory usage is constant -- about 20MiB So
This is not so much a cohttp bug as an implementation flaw of lwt_io. |
I learned of the present issue from Caml Weekly News reporting on @kandu's comment on Discuss. It seems to point to a fundamental memory-usage issue with Has this issue actually been reported to the Lwt folks? If yes, can you point to the corresponding issue there? If the issue is in fact related to Bigarray usage, notice that the GC ways to deal with outside-heap memory usage has improved in the past years (a few years ago but... after Lwt was initially implemented), and potential "well-known issues" may be solvable more or less easily today. (Possibly that would involve discussing with upstream ocaml-runtime folks, but it makes sense to go through Lwt people first.) |
Nope, it was not. The issue was successfully worked around in the new cohttp client and servers though. |
I have had similar problems with Ocsigen, reported in: I tried read through the changelog of cohttp but I was not able to find this fix. |
I created an upstream issue for Lwt at ocsigen/lwt#972 . (I wish people more knowledgeable about the cohttp issue had done it themselves, because I couldn't give much useful information.) |
@hansole the upcoming 6.0.0. The first alpha release is on opam-repository and will likely be merged soon. |
Note that to address the problem on the server side, you will need to switch to cohttp-server-lwt-unix. |
Tested with OCaml 4.04.0+flambda, cohttp 0.22.0, conduit 0.15.0, Lwt w/libev, CentOS 7 64bit VM:
Then hitting it with
ab
on the same system (may require ulimit adjustments):In my tests,
ab
gets through ~99% of the requests just fine but the last few hang for a bit and the cohttp server process jumps from <20 megabytes of RAM used to >150 megabytes used. Repeating theab
invocation shows the same behavior - ~99% of requests complete them RAM usage jumps for the cohttp server process.The cohttp RAM use never drops back down so there seems to be a resource leak somewhere.
The text was updated successfully, but these errors were encountered: