You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an issue with an app that uses OCRmyPDF for OCR
I am using a recent version of the third party app
I will include a file that reproduces the issuse
Third party app name and version
paperless-ngx 2.14.7
Describe the bug
I try to upload a file (BSAV - Beitragsorientierte Siemens Altersversorgung) - so sorry I won't provide that file to you ;-) and the upload fails with the error seen below.
When printing to another file and uploading this printed pdf everything works as expected.
If you need more information than the stacktrace please ping me, maybe I can provide/get more debug information for you.
[2025-02-06 10:07:46,681] [DEBUG] [paperless.parsing.tesseract] Deleting directory /tmp/paperless/paperless-bj6qa3iw
[2025-02-06 10:07:46,685] [ERROR] [paperless.consumer] Error occurred while consuming document TRS_BSAV-Kontoauszug_Z003PVYF_2025-1.pdf: AttributeError: 'int' object has no attribute 'get'
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 382, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 380, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
return _run_pipeline(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 174, in _run_pipeline
pdfinfo = do_get_pdfinfo(origin_pdf, executor, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 318, in do_get_pdfinfo
return get_pdfinfo(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 199, in get_pdfinfo
return PdfInfo(
^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 1170, in __init__
pscript5_mode = str(pdf.docinfo.get(Name.Creator, "")).startswith(
^^^^^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'get'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 327, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 477, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 449, in parse
raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
documents.parsers.ParseError: AttributeError: 'int' object has no attribute 'get'
[2025-02-06 10:07:46,743] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: TRS_BSAV-Kontoauszug_Z003PVYF_2025-1.pdf: Error occurred while consuming document TRS_BSAV-Kontoauszug_Z003PVYF_2025-1.pdf: AttributeError: 'int' object has no attribute 'get'
Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 382, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/api.py", line 380, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 214, in run_pipeline
return _run_pipeline(options, plugin_manager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 174, in _run_pipeline
pdfinfo = do_get_pdfinfo(origin_pdf, executor, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 318, in do_get_pdfinfo
return get_pdfinfo(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 199, in get_pdfinfo
return PdfInfo(
^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 1170, in __init__
pscript5_mode = str(pdf.docinfo.get(Name.Creator, "")).startswith(
^^^^^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'get'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 327, in main_wrap
raise exc_info[1]
File "/usr/src/paperless/src/documents/consumer.py", line 477, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 449, in parse
raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
documents.parsers.ParseError: AttributeError: 'int' object has no attribute 'get'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/tasks.py", line 154, in consume_file
msg = plugin.run()
^^^^^^^^^^^^
File "/usr/src/paperless/src/documents/consumer.py", line 509, in run
self._fail(
File "/usr/src/paperless/src/documents/consumer.py", line 151, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: TRS_BSAV-Kontoauszug_Z003PVYF_2025-1.pdf: Error occurred while consuming document TRS_BSAV-Kontoauszug_Z003PVYF_2025-1.pdf: AttributeError: 'int' object has no attribute 'get'
The text was updated successfully, but these errors were encountered:
Simple sanity checks
Third party app name and version
paperless-ngx 2.14.7
Describe the bug
I try to upload a file (BSAV - Beitragsorientierte Siemens Altersversorgung) - so sorry I won't provide that file to you ;-) and the upload fails with the error seen below.
When printing to another file and uploading this printed pdf everything works as expected.
If you need more information than the stacktrace please ping me, maybe I can provide/get more debug information for you.
Steps to reproduce
Files
No response
OCRmyPDF version
No response
Relevant log output
The text was updated successfully, but these errors were encountered: