Document has no content, skipping analysis #222

Baael · 2025-01-24T07:36:07Z

Describe the bug
After a few hours of operation, a series of PDF documents (originating from emails) appears marked as "no content," causing Paperless AI to get stuck in a loop, repeatedly trying to parse the same documents.

How to Reproduce
Just have around 25 not parsable documents.

Expected behavior
The documents should be marked as unprocessable and skipped in the next loop.

Screenshots

[DEBUG] Found existing tag "unprocessed" via API with ID 40
[DEBUG] Filtering documents for tag IDs: [ 40 ]
[DEBUG] Fetched page 1, got 73 documents. [DEBUG] Total so far: 73
[DEBUG] Fetched page 2, got 100 tags. [DEBUG] Total so far: 200
[DEBUG] Finished fetching. Found 73 documents.
[DEBUG] Fetched page 3, got 96 tags. [DEBUG] Total so far: 296
[DEBUG] Document 2504 has no content, skipping analysis
[DEBUG] Document 2509 has no content, skipping analysis
[DEBUG] Document 2516 has no content, skipping analysis
[DEBUG] Document 2517 has no content, skipping analysis
[DEBUG] Document 2532 has no content, skipping analysis
[DEBUG] Document 2552 has no content, skipping analysis
[DEBUG] Document 2553 has no content, skipping analysis
[DEBUG] Document 2597 has no content, skipping analysis
[DEBUG] Document 2598 has no content, skipping analysis
[DEBUG] Document 2626 has no content, skipping analysis
[DEBUG] Document 2652 has no content, skipping analysis
[DEBUG] Document 2653 has no content, skipping analysis
[DEBUG] Document 2732 has no content, skipping analysis
[DEBUG] Document 2733 has no content, skipping analysis
[DEBUG] Document 2788 has no content, skipping analysis
[DEBUG] Document 2789 has no content, skipping analysis
[DEBUG] Document 2801 has no content, skipping analysis
[DEBUG] Document 2893 has no content, skipping analysis
[DEBUG] Document 2894 has no content, skipping analysis
[DEBUG] Document 2895 has no content, skipping analysis
[DEBUG] Document 2924 has no content, skipping analysis
[DEBUG] Document 2925 has no content, skipping analysis
[DEBUG] Document 2926 has no content, skipping analysis
[DEBUG] Document 2946 has no content, skipping analysis
[INFO] Task completed

Additional context
Parsing document incoming from an email. Password-protected or encrypted documents.

I just found that after ~500 documents paperless-ai stucked on it for whole night :)

The text was updated successfully, but these errors were encountered:

clusterzx · 2025-01-24T07:45:43Z

Hmm I mean that check should ony be a matter of few seconds, so reprocessing thesse files will not take minute in my expectation.
But the "problem" is that regarding doing that I also have to rewrite the whole database structure, the dashboard frontend and the main scanning function. I mean its an edge case right now as this occurres very rarely.

I understand you issue but I dont see it stuck there. Stuck would mean that the application does not do anything after that specific case. But looking at the [INFO] Task completed it seems to work as expected.

I just found that after ~500 documents paperless-ai stucked on it for whole night :)
What do you mean by 'for a whole night'.

Baael · 2025-01-24T07:53:16Z

I have around 1000 documents to process, 480 with tag "unprocessed" left.

I understand you issue but I dont see it stuck there. Stuck would mean that the application does not do anything after that specific case. But looking at the [INFO] Task completed it seems to work as expected.

I will check everything again, cause I have again a feeling that I was too fast and I am the Bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document has no content, skipping analysis #222

Document has no content, skipping analysis #222

Baael commented Jan 24, 2025 •

edited

Loading

clusterzx commented Jan 24, 2025

Baael commented Jan 24, 2025

Document has no content, skipping analysis #222

Document has no content, skipping analysis #222

Comments

Baael commented Jan 24, 2025 • edited Loading

clusterzx commented Jan 24, 2025

Baael commented Jan 24, 2025

Baael commented Jan 24, 2025 •

edited

Loading