You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the newly developed merge_single_cells() (introduced in #219) to convert my .sqlite files into parquet.
I am currently running into an issue where large files (specifically over 20GB) are being killed by Operating System due to out of memory (OOM) error.
The image shows that some sqlite files were successfully converted into parquet files but the OS kernel raised a killed message on ../data/SQ00014611.sqlite.
To figure out what error caused the OS kernel to kill the process, I typed:
sudo dmesg -T| grep -E -i -B100 'killed process'
and returned:
Out of memory: Killed process 137056 (python) total-vm:58479492kB
Based on the message it, seems the .merge_single_cells() at some point uses 58.5GB of memory on a 20.78GB file.
Here is an image of PopOS resource monitor when using .merge_single_cells() on ../data/SQ00014611.sqlite. dataset a few seconds before the OS kernel killing the job and raising OOM error:
Below is the source code used and a download link that points to the dataset that causes this issue.
Most of the memory consumption is occurring within the merging operations [lines 130-137].
My guess is that the merging function still retains the non-merged and merged DataFrames. This can become an issue as the merged DataFrame continues to require more memory over time, while still reserving memory for the non-merged datasets.
Once the merging function is complete, there is a giant drop of memory usage in line 140.
This stack overflow post describes a similar issue with the merge function.
I could be wrong, this is my intuition by just looking at the memory profiler report.
Hello everyone,
I have been using the newly developed
merge_single_cells()
(introduced in #219) to convert my.sqlite
files into parquet.I am currently running into an issue where large files (specifically over 20GB) are being killed by Operating System due to out of memory (OOM) error.
The image shows that some
sqlite
files were successfully converted into parquet files but the OS kernel raised akilled
message on../data/SQ00014611.sqlite.
To figure out what error caused the OS kernel to kill the process, I typed:
and returned:
Based on the message it, seems the
.merge_single_cells()
at some point uses 58.5GB of memory on a 20.78GB file.Here is an image of
PopOS
resource monitor when using.merge_single_cells()
on../data/SQ00014611.sqlite.
dataset a few seconds before the OS kernel killing the job and raising OOM error:Below is the source code used and a download link that points to the dataset that causes this issue.
SQ00014611 dataset download link
Memory benchmarking
I have conducted a memory profile using memory-profiler , to locate where the high memory usage is occurring.
The approach is similar to the one conducted in #195 post
This memory profile was done with a smaller dataset SQ00014613.sqlite in order to generate a complete profile.
Most of the memory consumption is occurring within the merging operations [lines 130-137].
My guess is that the merging function still retains the non-merged and merged DataFrames. This can become an issue as the merged DataFrame continues to require more memory over time, while still reserving memory for the non-merged datasets.
Once the merging function is complete, there is a giant drop of memory usage in line 140.
This stack overflow post describes a similar issue with the
merge
function.I could be wrong, this is my intuition by just looking at the memory profiler report.
Machine info
@gwaybio @d33bs
The text was updated successfully, but these errors were encountered: