Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document has no content, skipping analysis #222

Open
Baael opened this issue Jan 24, 2025 · 2 comments
Open

Document has no content, skipping analysis #222

Baael opened this issue Jan 24, 2025 · 2 comments

Comments

@Baael
Copy link

Baael commented Jan 24, 2025

Describe the bug
After a few hours of operation, a series of PDF documents (originating from emails) appears marked as "no content," causing Paperless AI to get stuck in a loop, repeatedly trying to parse the same documents.

How to Reproduce
Just have around 25 not parsable documents.

Expected behavior
The documents should be marked as unprocessable and skipped in the next loop.

Screenshots

[DEBUG] Found existing tag "unprocessed" via API with ID 40
[DEBUG] Filtering documents for tag IDs: [ 40 ]
[DEBUG] Fetched page 1, got 73 documents. [DEBUG] Total so far: 73
[DEBUG] Fetched page 2, got 100 tags. [DEBUG] Total so far: 200
[DEBUG] Finished fetching. Found 73 documents.
[DEBUG] Fetched page 3, got 96 tags. [DEBUG] Total so far: 296
[DEBUG] Document 2504 has no content, skipping analysis
[DEBUG] Document 2509 has no content, skipping analysis
[DEBUG] Document 2516 has no content, skipping analysis
[DEBUG] Document 2517 has no content, skipping analysis
[DEBUG] Document 2532 has no content, skipping analysis
[DEBUG] Document 2552 has no content, skipping analysis
[DEBUG] Document 2553 has no content, skipping analysis
[DEBUG] Document 2597 has no content, skipping analysis
[DEBUG] Document 2598 has no content, skipping analysis
[DEBUG] Document 2626 has no content, skipping analysis
[DEBUG] Document 2652 has no content, skipping analysis
[DEBUG] Document 2653 has no content, skipping analysis
[DEBUG] Document 2732 has no content, skipping analysis
[DEBUG] Document 2733 has no content, skipping analysis
[DEBUG] Document 2788 has no content, skipping analysis
[DEBUG] Document 2789 has no content, skipping analysis
[DEBUG] Document 2801 has no content, skipping analysis
[DEBUG] Document 2893 has no content, skipping analysis
[DEBUG] Document 2894 has no content, skipping analysis
[DEBUG] Document 2895 has no content, skipping analysis
[DEBUG] Document 2924 has no content, skipping analysis
[DEBUG] Document 2925 has no content, skipping analysis
[DEBUG] Document 2926 has no content, skipping analysis
[DEBUG] Document 2946 has no content, skipping analysis
[INFO] Task completed

Additional context
Parsing document incoming from an email. Password-protected or encrypted documents.

I just found that after ~500 documents paperless-ai stucked on it for whole night :)

@clusterzx
Copy link
Owner

Hmm I mean that check should ony be a matter of few seconds, so reprocessing thesse files will not take minute in my expectation.
But the "problem" is that regarding doing that I also have to rewrite the whole database structure, the dashboard frontend and the main scanning function. I mean its an edge case right now as this occurres very rarely.

I understand you issue but I dont see it stuck there. Stuck would mean that the application does not do anything after that specific case. But looking at the [INFO] Task completed it seems to work as expected.

I just found that after ~500 documents paperless-ai stucked on it for whole night :)
What do you mean by 'for a whole night'.

@Baael
Copy link
Author

Baael commented Jan 24, 2025

I have around 1000 documents to process, 480 with tag "unprocessed" left.

I understand you issue but I dont see it stuck there. Stuck would mean that the application does not do anything after that specific case. But looking at the [INFO] Task completed it seems to work as expected.

I will check everything again, cause I have again a feeling that I was too fast and I am the Bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants