Abrubt Termination (Without any error) on Google Colab, AWS EC2 #270

G999n · 2024-08-25T16:09:49Z

The conversion process abruptly terminates at random intervals in the Detecting Boxes Stage on Google Colab and AWS EC2 instance (WIndows). The percentage value varies randomly.

AWS EC2

Google Colab

Document size is 198 pages with a mixture of selectable text, scanned text, screenshots of certificates, tables, scanned images of printed tables, etc.

frankbaele · 2024-08-26T13:40:55Z

what was the file size? how many pages? i can be that the instances runs out of memory.

G999n · 2024-08-26T18:57:25Z

what was the file size? how many pages? i can be that the instances runs out of memory.

Document size is 198 pages with a mixture of selectable text, scanned text, screenshots of certificates, tables, scanned images of printed tables, etc.

The file size is 15.6 MB.
As per the instructions, I was using freeRAM//3 as the batch_multiplier
--batch_multiplier 3 on Colab (which had 11 GB of free RAM)
--batch_multiplier 2 (and then tried 1 too) on AWS EC2 (which had 8 GB of RAM)
However, both of the above were CPU instances. I wasn't using any GPU in colab or EC2.

The conversion worked fine on vast.ai's jupyter lab instance with RTX 4090 (24 GB VRAM) and 32 GB RAM. I had used --batch_multiplier 7 here.

Apart from the memory required for the batches (which is ~3GB per batch), I had assumed that a minimal memory will be required by the program that would be constant regardless of the pdf size. Is it not the case?

frankbaele · 2024-08-27T06:03:59Z

The vram is limited and will not go up with page size, but you ram will.

A workaround would be slicing your pdf with PyMuPDF in smaller batches and merging the results.

G999n · 2024-08-27T16:29:48Z

All right
Thanks a lot

G999n closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abrubt Termination (Without any error) on Google Colab, AWS EC2 #270

Abrubt Termination (Without any error) on Google Colab, AWS EC2 #270

G999n commented Aug 25, 2024

frankbaele commented Aug 26, 2024

G999n commented Aug 26, 2024

frankbaele commented Aug 27, 2024 •

edited

Loading

G999n commented Aug 27, 2024

Abrubt Termination (Without any error) on Google Colab, AWS EC2 #270

Abrubt Termination (Without any error) on Google Colab, AWS EC2 #270

Comments

G999n commented Aug 25, 2024

frankbaele commented Aug 26, 2024

G999n commented Aug 26, 2024

frankbaele commented Aug 27, 2024 • edited Loading

G999n commented Aug 27, 2024

frankbaele commented Aug 27, 2024 •

edited

Loading