Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive memory leak in Julia due to tiny memory leak in C #55794

Open
Yixiao-Zhang opened this issue Sep 17, 2024 · 6 comments
Open

Massive memory leak in Julia due to tiny memory leak in C #55794

Yixiao-Zhang opened this issue Sep 17, 2024 · 6 comments
Labels
GC Garbage collector

Comments

@Yixiao-Zhang
Copy link

It is copied from my post on Julia discourse.

The latest version (v4.9.2) of netCDF-C (a C library for outputting data in the network Common Data Form.) is known to have memory leak. The leaked memory is not more than several MBs (see this link). However, when I use NCDatasets.jl, which is a wrapper in Julia for netcdf-C, I find that the memory leak reaches several GBs in the demo below:

using Printf
using NCDatasets

function create_dummpy_netcdf()
    dset = NCDataset("output.nc", "c", format=:netcdf4)
    close(dset)
end

function output_dummy_netcdf()
    dset = NCDataset("output.nc", "a")
    zeros(Float64, 1024, 1024)
    close(dset)
end


function main()

    create_dummpy_netcdf()
    for i in 1:1000
        output_dummy_netcdf()

        @info Printf.@sprintf "Max. RSS:  %9.3f MiB\n" Sys.maxrss()/2^20
    end

end

main()

The total memory usage reaches 8.2 GiB in the end, which is roughly 1000 times the allocation of zeros(Float64, 1024, 1024). Removing zeros(Float64, 1024, 1024) in output_dummy_netcdf makes the total memory usage limited to 400 MiB. It seems to me that the memory leak in the C code is "amplified" by allocations in Julia.

This issue has been posted as Alexander-Barth/NCDatasets.jl#266 (with version info). @Alexander-Barth finds that this bug can be produced simply with ccall-ing functions in libnetcdf.so. I am bringing the discussion here because I think it is related to how Julia manages memory.

I have also tried using Valgrind to profile the heap memory usage. However, when running with Valgrind, this bug cannot be reproduced. If you know a better tool for profiling memory in Julia, please let me know.

@d-netto
Copy link
Member

d-netto commented Sep 17, 2024

Copying the versioninfo from Alexander-Barth/NCDatasets.jl#266:

Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info: Official https://julialang.org/ release
Platform Info: OS: Linux (x86_64-linux-gnu)
CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, rocketlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

Tested both on an M2 and on a Linux x86_64 machine. Could reproduce it on Linux, but the increase in RSS seems to be much less severe on Mac.

  • Tweaked MWE:
using Printf
using NCDatasets

function create_dummpy_netcdf()
    dset = NCDataset("output.nc", "c", format=:netcdf4)
    close(dset)
end

function output_dummy_netcdf()
    dset = NCDataset("output.nc", "a")
    zeros(Float64, 1024, 1024)
    close(dset)
end


function main()

    create_dummpy_netcdf()
    for i in 1:1_000
        output_dummy_netcdf()
        @info Printf.@sprintf "Live bytes:  %9.3f MiB\n" Base.gc_live_bytes()/2^20
        @info Printf.@sprintf "Max. RSS:  %9.3f MiB\n" Sys.maxrss()/2^20
    end

end

main()
  • Mac:
Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: macOS (arm64-apple-darwin23.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
...
[ Info: Live bytes:     29.729 MiB
[ Info: Max. RSS:    631.516 MiB
[ Info: Live bytes:     37.745 MiB
[ Info: Max. RSS:    631.516 MiB
[ Info: Live bytes:     45.760 MiB
[ Info: Max. RSS:    631.516 MiB
[ Info: Live bytes:     53.776 MiB
[ Info: Max. RSS:    631.516 MiB
[ Info: Live bytes:     13.699 MiB
[ Info: Max. RSS:    631.516 MiB
[ Info: Live bytes:     21.714 MiB
[ Info: Max. RSS:    631.531 MiB
[ Info: Live bytes:     29.730 MiB
[ Info: Max. RSS:    631.531 MiB
[ Info: Live bytes:     37.745 MiB
[ Info: Max. RSS:    631.531 MiB
  • Linux x86_64:
Julia Version 1.10.4
Commit 48d4fd4843 (2024-06-04 10:41 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 36 virtual cores)
...
[ Info: Live bytes:     21.750 MiB
[ Info: Max. RSS:   8097.000 MiB
[ Info: Live bytes:     29.763 MiB
[ Info: Max. RSS:   8104.992 MiB
[ Info: Live bytes:     37.776 MiB
[ Info: Max. RSS:   8112.984 MiB
[ Info: Live bytes:     45.789 MiB
[ Info: Max. RSS:   8120.977 MiB
[ Info: Live bytes:     53.802 MiB
[ Info: Max. RSS:   8128.969 MiB
[ Info: Live bytes:     13.737 MiB
[ Info: Max. RSS:   8136.961 MiB
[ Info: Live bytes:     21.750 MiB
[ Info: Max. RSS:   8144.953 MiB
[ Info: Live bytes:     29.763 MiB
[ Info: Max. RSS:   8152.945 MiB
[ Info: Live bytes:     37.776 MiB
[ Info: Max. RSS:   8160.938 MiB

@d-netto
Copy link
Member

d-netto commented Sep 17, 2024

Wondering if this could be related to something specific to glibc itself?

(E.g. maybe glibc algorithm fragments more than Apple's, or keeps more pages around and is lazier than Apple's when returning them to the OS, etc.).

@d-netto d-netto added the GC Garbage collector label Sep 17, 2024
@giordano
Copy link
Contributor

You mean something like #42566?

@d-netto
Copy link
Member

d-netto commented Sep 17, 2024

Just to confirm this leak is not coming from the pool allocator...

Added this patch to v1.10.4:

diff --git a/src/gc-pages.c b/src/gc-pages.c
index 682e76611f..be2ec0462a 100644
--- a/src/gc-pages.c
+++ b/src/gc-pages.c
@@ -9,6 +9,11 @@
 extern "C" {
 #endif
 
+JL_DLLEXPORT uint64_t jl_get_pg_size(void)
+{
+    return GC_PAGE_SZ;
+}
+
 // Try to allocate memory in chunks to permit faster allocation
 // and improve memory locality of the pools
 #ifdef _P64
@@ -19,6 +24,12 @@ extern "C" {
 #define MIN_BLOCK_PG_ALLOC (1) // 16 KB
 
 static int block_pg_cnt = DEFAULT_BLOCK_PG_ALLOC;
+static _Atomic(size_t) current_pg_count = 0;
+
+JL_DLLEXPORT uint64_t jl_current_pg_count(void)
+{
+    return (uint64_t)jl_atomic_load(&current_pg_count);
+}
 
 void jl_gc_init_page(void)
 {
@@ -148,6 +159,7 @@ exit:
     SetLastError(last_error);
 #endif
     errno = last_errno;
+    jl_atomic_fetch_add(&current_pg_count, 1);
     return meta;
 }
 
@@ -188,6 +200,7 @@ void jl_gc_free_page(jl_gc_pagemeta_t *pg) JL_NOTSAFEPOINT
     madvise(p, decommit_size, MADV_DONTNEED);
 #endif
     msan_unpoison(p, decommit_size);
+    jl_atomic_fetch_add(&current_pg_count, -1);
 }
 
 #ifdef __cplusplus

and then ran this MWE:

using Printf
using NCDatasets

function create_dummpy_netcdf()
    dset = NCDataset("output.nc", "c", format=:netcdf4)
    close(dset)
end

function output_dummy_netcdf()
    dset = NCDataset("output.nc", "a")
    zeros(Float64, 1024, 1024)
    close(dset)
end


function main()

    create_dummpy_netcdf()
    for i in 1:1_000
        output_dummy_netcdf()
        @info Printf.@sprintf "Live bytes:  %9.3f MiB\n" Base.gc_live_bytes()/2^20
        @info Printf.@sprintf "Max. RSS:  %9.3f MiB\n" Sys.maxrss()/2^20
        @info Printf.@sprintf "Current page count: %d\n" @ccall jl_current_pg_count()::Cint
    end
end

main()

Current page count was fairly stable:

...
[ Info: Live bytes:     53.916 MiB
[ Info: Max. RSS:   8102.328 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     13.822 MiB
[ Info: Max. RSS:   8110.320 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     21.841 MiB
[ Info: Max. RSS:   8118.312 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     29.860 MiB
[ Info: Max. RSS:   8126.305 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     37.879 MiB
[ Info: Max. RSS:   8134.297 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     45.898 MiB
[ Info: Max. RSS:   8142.289 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     53.916 MiB
[ Info: Max. RSS:   8150.281 MiB
[ Info: Current page count: 1980

@d-netto
Copy link
Member

d-netto commented Sep 17, 2024

You mean something like #42566?

Possibly.

@vchuravy
Copy link
Member

I get:

[ Info: Max. RSS:    478.117 MiB
[ Info: Max. RSS:    478.117 MiB
  0.932792 seconds (753.56 k allocations: 7.865 GiB, 18.04% gc time, 22.93% compilation time)
julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

No branches or pull requests

4 participants