Memory Leak on inference #1418

TomekPro · 2024-01-03T10:21:46Z

Bug description

Running doctr for multiple images in a loop causes massive memory leak.

Code snippet to reproduce the bug

import os
import tqdm
from pathlib import Path
from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)

path = Path("/path/with/jpgs")
for file in tqdm.tqdm(os.listdir(path)[0:20]):
    file_path = path / file
    doc = DocumentFile.from_images(file_path)
    result = model(doc)

Runned in the following way:
mprof run python test.py
mprof plot

When I modified a loop so that model was initialized as well in the loop problem was still present.

Diving into the code it seems that the problem is caused by the actual pytorch inference. For example here:

Error traceback

As showed in the plot above.

Environment

Tested on empty poetry environment with just 2 packages installed:
pip install "python-doctr[torch]"
pip install memory_profiler

python 3.8.10
python-doctr 0.7.0

Ubuntu 20.04
Running on cpu

Deep Learning backend

is_tf_available: False
is_torch_available: True

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2024-01-03T11:34:30Z

Hi @TomekPro 👋,
Thanks for reporting this 👍
It should be already fixed on the main branch (v0.8.0a) -> #1357

TomekPro · 2024-01-03T11:55:29Z

Hi @felixdittrich92, unfortunately this problem still occurs.

pip uninstall python_doctr
git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Then when I do pip list | grep doctr:
python-doctr 0.8.0a0

The plot still looks the same:

TomekPro · 2024-01-03T12:53:11Z

Moving to torch 2.1 cpuonly: pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cpu slightly helps but still leakage is clear:

felixdittrich92 · 2024-01-03T14:30:51Z

Mh yeah i see does this leak only exits for the CRNN models could you test it also with vitstr_small / parseq or the master models ?
This would be helpful to maybe limit the bug

TomekPro · 2024-01-03T15:02:21Z

I just tested that the problem occurs for other recognition architectures as well, what makes me thinking that this is sth around how pytorch is used in doctr. I'm looking for a solution as well.
Parseq - just slightly better than crnn.

vitstr_small - the same:

master - the same:

torch 2.1.1+cpu

felixdittrich92 · 2024-01-03T16:53:45Z

@TomekPro Have you tried to pass the paths as list to doc = DocumentFile.from_images([os.path.join(root, file) for file in os.listdir(root)])
And specify the batch size depending on your hardware:
model = ocr_predictor(pretrained=True, det_bs=4, reco_bs=512) for example ?

I agree after posting your plots that's still a bug (maybe on pytorch) so only an idea you could try in the meanwhile

felixdittrich92 · 2024-01-03T16:56:00Z

And another thing you can try: #1356 (comment)

TomekPro · 2024-01-03T17:28:22Z

Yes, I tried env variables like ONEDNN_PRIMITIVE_CACHE_CAPACITY and alike without success:/
Regarding approach you suggested:

model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images([os.path.join(path, file) for file in os.listdir(path)[0:20]])
result = model(doc)

It produces really strange result, overall memory consumption is the same, it just hits max very quickly and then it is stable Maybe this could be a clue? When increasing batch size it get's even higher. Overall, I agree that this is probably pytorch thing but still it makes really hard to use doctr in real life as application would need to be restarted very often to avoid crash due to exceeding memory.

felixdittrich92 · 2024-01-03T17:36:39Z

You can also disable multiprocessing which should also lower the RAM usage a bit
See point 1 : https://mindee.github.io/doctr/using_doctr/running_on_aws.html

felixdittrich92 · 2024-01-03T17:37:36Z

But yeah i think we need to profile it more detailed again to find the real bottleneck

TomekPro · 2024-01-03T20:57:41Z

@felixdittrich92 finally, three things are needed to fix this memory leak:

export DOCTR_MULTIPROCESSING_DISABLE=TRUE
export ONEDNN_PRIMITIVE_CACHE_CAPACITY=1
Upgrade torch to 2.1 (in my case cpu-only version): pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cpu

Thanks for your help :)

felixdittrich92 · 2024-01-03T21:29:10Z

Nice 👍 Btw. using a smaller detection model will again reduce the mem usage for example db_mobilenet_v3_large if it still works well enough for your use case

TomekPro added the type: bug Something isn't working label Jan 3, 2024

mindee locked and limited conversation to collaborators Jan 4, 2024

felixdittrich92 converted this issue into discussion #1422 Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Memory Leak on inference #1418

Memory Leak on inference #1418

TomekPro commented Jan 3, 2024 •

edited

Loading

felixdittrich92 commented Jan 3, 2024

TomekPro commented Jan 3, 2024

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024 •

edited

Loading

felixdittrich92 commented Jan 3, 2024 •

edited

Loading

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Memory Leak on inference #1418

Memory Leak on inference #1418

Comments

TomekPro commented Jan 3, 2024 • edited Loading

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

felixdittrich92 commented Jan 3, 2024

TomekPro commented Jan 3, 2024

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024 • edited Loading

felixdittrich92 commented Jan 3, 2024 • edited Loading

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024

TomekPro commented Jan 3, 2024

felixdittrich92 commented Jan 3, 2024 • edited Loading

This issue was moved to a discussion.

TomekPro commented Jan 3, 2024 •

edited

Loading

felixdittrich92 commented Jan 3, 2024 •

edited

Loading

felixdittrich92 commented Jan 3, 2024 •

edited

Loading

felixdittrich92 commented Jan 3, 2024 •

edited

Loading