Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much memory cost for big pdf 800 pages , cost 80GB ram. #269

Closed
whp98 opened this issue Aug 25, 2024 · 3 comments
Closed

Too much memory cost for big pdf 800 pages , cost 80GB ram. #269

whp98 opened this issue Aug 25, 2024 · 3 comments

Comments

@whp98
Copy link
Author

whp98 commented Aug 25, 2024

sometimes it fail with cuda oom
My gpu is 4060ti 16G

pdf is this https://github.com/yuanliangding/books/blob/master/%E8%AE%A1%E7%AE%97%E6%9C%BA-%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80-JAVA/Java%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B%E5%AE%9E%E6%88%98.pdf

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 15.60 GiB of which 747.88 MiB is free. Including non-PyTorch memory, this process has 2.46 GiB memory in use. Of the allocated memory 2.14 GiB is allocated by PyTorch, and 166.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Error converting PDF to Markdown: Command '['marker_single', '/home/zzz/文档/PDF/Java并发编程实战.pdf', '/home/sss/
dsadas/pdf-to-markdown/output']' returned non-zero exit status 1.

@frankbaele
Copy link

that's not an absurd thing to have, many pdf servicse have page/file limits for this. You can solve this by slicing your pdfs with an other pdf lib and then joining them at the end.

@VikParuchuri
Copy link
Owner

The CPU ram issue should be fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants