You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@phibeck reported this bug to me and I have made a mwe of a segfault. setindex! on a dataset with an array of a compound datatype appears to be thread unsafe.
This is the mwe
using HDF5
T = ComplexF64
# T = Float64 # OK
sizes = (100, 100, 100)
Threads.@threadsfor i in1:Threads.nthreads()
h5open("file$i.hdf5", "w") do h5
d =create_dataset(h5, "rand", T, sizes)
for n in1:sizes[3]
d[:,:,n] =rand(T, sizes[1:2])
endendend
This is the segfault, although the message is different every time
$ julia --project --threads 6 mwe.jl
[400950] signal 7 (128): Bus error
in expression starting at /local/home/lxvm/projects/autobz/hdf5/mwe.jl:6
H5FL_reg_malloc at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
H5FL_reg_calloc at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
H5CX_push at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
H5Sget_simple_extent_type at /home/lxvm/.julia/artifacts/3f844d84068534dcd6606936ab5f28e1120a9bb0/lib/libhdf5.so (unknown line)
h5s_get_simple_extent_type at /home/lxvm/.julia/packages/HDF5/Z859u/src/api/functions.jl:6716
setindex! at /home/lxvm/.julia/packages/HDF5/Z859u/src/datasets.jl:368
unknown function(ip: 0x7f6496319060)
#2 at /local/home/lxvm/projects/autobz/hdf5/mwe.jl:10#17 at /home/lxvm/.julia/packages/HDF5/Z859u/src/file.jl:101
task_local_storage at ./task.jl:315
#h5open#16 at /home/lxvm/.julia/packages/HDF5/Z859u/src/file.jl:96 [inlined]
h5open at /home/lxvm/.julia/packages/HDF5/Z859u/src/file.jl:94 [inlined]
macro expansion at /local/home/lxvm/projects/autobz/hdf5/mwe.jl:7 [inlined]
#26#threadsfor_fun#1 at ./threadingconstructs.jl:252#26#threadsfor_fun at ./threadingconstructs.jl:219 [inlined]#1 at ./threadingconstructs.jl:154
unknown function(ip: 0x7f649630ba9f)
jl_apply at /cache/build/builder-demeter6-6/julialang/julia-master/src/julia.h:2157 [inlined]
start_task at /cache/build/builder-demeter6-6/julialang/julia-master/src/task.c:1202
Allocations: 4069986 (Pool: 4069810; Big: 176); GC: 6
Bus error
In @phibeck's bug report the segfault showed a trace through setindex! -> get_jl_type -> get_mem_compatible_jl_type -> h5t_get_member_name and I noticed there was no lock on the library call in this function:
See `libhdf5` documentation for [`H5Oopen`](https://portal.hdfgroup.org/display/HDF5/H5T_GET_MEMBER_NAME).
"""
functionh5t_get_member_name(type_id, index)
It seems that calling setindex! with an array of a compound datatype, such as ComplexF64, hits this codepath and consistently segfaults.
Although our code could be rewritten to avoid multi-threaded hdf5 calls, I am hoping to identify and fix the issue.
For reference, I ran this on linux and this is the environment:
The text was updated successfully, but these errors were encountered:
lxvm
changed the title
threaded segfault in setindex! with compound types
threaded segfault in setindex! with arrays of compound-typed elements
Nov 15, 2024
The entire C library is not thread safe. You are correct that in this case the API lock on the Julia side has been bypassed. I suppose we can make it not segfault, but nonetheless using threads with HDF5 is not really going to work.
In our application the overhead of the API lock is negligible. We are usually calculating expensive integrals each taking at least a minute per thread that return 3x3 complex matrices. We only need the real part of this matrix, and could avoid segfaults by taking the real part. We could also rewrite our code in a distributed manner since it is embarrassingly parallel. However, if all that is required to avoid these segfaults is to wrap each ccall in src/api/helpers.jl with a lock and try block, like src/api/functions.jl does, then I am happy to make a pr because it only looks like there are only 7 thread-unsafe ccalls.
Hello,
@phibeck reported this bug to me and I have made a mwe of a segfault.
setindex!
on a dataset with an array of a compound datatype appears to be thread unsafe.This is the mwe
This is the segfault, although the message is different every time
In @phibeck's bug report the segfault showed a trace through
setindex! -> get_jl_type -> get_mem_compatible_jl_type -> h5t_get_member_name
and I noticed there was no lock on the library call in this function:HDF5.jl/src/api/helpers.jl
Lines 982 to 989 in 1a24872
It seems that calling
setindex!
with an array of a compound datatype, such asComplexF64
, hits this codepath and consistently segfaults.Although our code could be rewritten to avoid multi-threaded hdf5 calls, I am hoping to identify and fix the issue.
For reference, I ran this on linux and this is the environment:
The text was updated successfully, but these errors were encountered: