Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference stuck in ...d2.evaluation.evaluator]: Start inference on X batches #90

Open
marcelogdeandrade opened this issue Oct 2, 2024 · 8 comments

Comments

@marcelogdeandrade
Copy link

Hello, I'm trying to run the project locally using docker on a 5 page PDF.

I basically ran:

$ git clone https://github.com/huridocs/pdf-document-layout-analysis
$ cd pdf-document-layout-analysis
$ docker run --rm --name pdf-document-layout-analysis -p 5060:5060 --entrypoint ./start.sh huridocs/pdf-document-layout-analysis:v0.0.14.1

The docker container started normally, and after trying to do a simple request:

$ curl -X POST -F 'file=@./my_file.pdf' localhost:5060

The docker container gets stuck on this part:

[10/02 02:50:12 d2.evaluation.evaluator]: Start inference on 5 batches

Here are some more logs

[10/02 02:49:11 detectron2]: Full config saved to /app/model_output_doclaynet/config.yaml
[10/02 02:49:43 detectron2]: Merge using: Sum
/app/src/ditod/Wordnn_embedding.py:48: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(bros_embedding_path + "pytorch_model.bin", map_location="cpu")
use_pretrain_weight: load model from: ../models/layoutlm-base-uncased/
[10/02 02:49:51 detectron2]: Model: Trainable network params num : 243,296,319
[10/02 02:49:51 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /app/models/doclaynet_VGT_model.pth ...
[10/02 02:49:51 fvcore.common.checkpoint]: [Checkpointer] Loading from /app/models/doclaynet_VGT_model.pth ...
/app/.venv/lib/python3.11/site-packages/fvcore-0.1.5.post20221221-py3.11.egg/fvcore/common/checkpoint.py:252: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-10-02 02:49:57,081 [INFO] Is PyTorch using GPU: False
[2024-10-02 02:49:57 +0000] [11] [INFO] Started server process [11]
[2024-10-02 02:49:57 +0000] [11] [INFO] Waiting for application startup.
[2024-10-02 02:49:57 +0000] [11] [INFO] Application startup complete.
2024-10-02 02:49:57,224 [INFO] Calling endpoint: run
2024-10-02 02:49:57,224 [INFO] Processing file: prova_a1_split.pdf
2024-10-02 02:49:57,239 [INFO] Creating PDF images
Page-1
Page-2
Page-3
Page-4
Page-5
2024-10-02 02:50:12,361 [INFO] Full TransformGens used in training: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')], crop: None
WARNING [10/02 02:50:12 d2.data.datasets.coco]: /app/jsons/test.json contains 5165 annotations, but only 0 of them match to images in the file.
[10/02 02:50:12 d2.data.datasets.coco]: Loaded 5 images in COCO format from /app/jsons/test.json
[10/02 02:50:12 d2.data.build]: Distribution of instances among all 11 categories:
|  category  | #instances   |   category    | #instances   |  category   | #instances   |
|:----------:|:-------------|:-------------:|:-------------|:-----------:|:-------------|
|  Caption   | 0            |   Footnote    | 0            |   Formula   | 0            |
| List_Item  | 0            |  Page_Footer  | 0            | Page_Header | 0            |
|  Picture   | 0            | Section_Hea.. | 0            |    Table    | 0            |
|    Text    | 0            |     Title     | 0            |             |              |
|   total    | 0            |               |              |             |              |
[10/02 02:50:12 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[10/02 02:50:12 d2.data.common]: Serializing 5 elements to byte tensors and concatenating them all ...
[10/02 02:50:12 d2.data.common]: Serialized dataset takes 0.00 MiB
[10/02 02:50:12 d2.evaluation.evaluator]: Start inference on 5 batches

I'm not seeing any issues with container resources (memory or cpu). Can you help me debug this issue? Thanks

@ali6parmak
Copy link
Collaborator

Hi, the commands you used should be fine. Can you try to curl the test_pdfs/regular.pdf and let's see if it works.

@marcelogdeandrade
Copy link
Author

Yes, it also gets stuck on [10/02 13:12:27 d2.evaluation.evaluator]: Start inference on 2 batches step - running it for more than 30 min

When I ran docker stats I see that the CPU utilization of the container is really high, but it shouldn't take that long to run:

image

@ali6parmak
Copy link
Collaborator

ali6parmak commented Oct 3, 2024

I have tried to reproduce what you are experiencing but unfortunately I wasn't able to, the service works fine on my end.

One thing you can try is, use the "fast" models to run a non-visual analysis:

curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060

You should be getting the response in a few seconds. Let's try and see if it works.

Also can you tell which OS are you using?

Thanks

@marcelogdeandrade
Copy link
Author

Running it with "fast = True" works fine.

I'm running it in a MacOS - M2 version (ARM arch)

OS: macOS Sonoma 14.4.1 arm64
Host: MacBook Pro (16-inch, 2023)
CPU: Apple M2 Pro (12) @ 3.50 GHz
GPU: Apple M2 Pro (19) @ 1.40 GHz [Integrated]

@marcelogdeandrade
Copy link
Author

I also ran colima - docker runtime for MacOS - in rosetta mode to run it properly on x86 architecture. The result is the same, however it is using a lot more CPU for quite some time:

CONTAINER ID   NAME                           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O         BLOCK I/O        PIDS
efa6aacd511e   pdf-document-layout-analysis   730.65%   2.929GiB / 15.61GiB   18.76%    104MB / 612kB   77.3MB / 108MB   31

@marcelogdeandrade
Copy link
Author

marcelogdeandrade commented Oct 4, 2024

Update: after 20 minuets of running it with 1000% CPU, it seems that 1 inference ran:

[10/04 01:24:41 d2.evaluation.evaluator]: Inference done 1/2. Dataloading: 1.7084 s/iter. Inference: 1345.5779 s/iter. Eval: 0.0077 s/iter. Total: 1347.3115 s/iter. ETA=0:22:27

Seems weird that it is really slow for the regular.pdf test file, right?

@marcelogdeandrade
Copy link
Author

It finished running after 45 minutes

[10/04 01:48:01 d2.evaluation.evaluator]: Inference done 2/2. Dataloading: 0.0000 s/iter. Inference: 1399.7729 s/iter. Eval: 0.0030 s/iter. Total: 1399.7759 s/iter. ETA=0:00:00
[10/04 01:48:01 d2.evaluation.evaluator]: Total inference time: 0:23:20.084540 (1400.084540 s / iter per device, on 1 devices)
[10/04 01:48:01 d2.evaluation.evaluator]: Total inference pure compute time: 0:23:19 (1399.772861 s / iter per device, on 1 devices)

@ali6parmak
Copy link
Collaborator

The visual model can be quite slow when it runs on CPU, but yes, for a simple two-page document 45 minutes is definitely too much. On my setup with an Intel® Core™ i7-8700 CPU @ 3.20GHz, it takes around 36 seconds to finish.

We haven't tested the service on macOS yet, so I'm not sure what might be causing this issue. Since the service runs but takes a long time, it could be related to hardware or system-specific optimizations. If we discover any solutions, we’ll be sure to update you. Similarly, if you find anything on your end, please let us know, and maybe we can do some improvements.

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants