-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PdfFileReader crashes on a normal PDF file #63
Comments
Met another file with the same error 😢 Seems not so rare. |
Update: these files can be opened normally with pypdf2. Seems it's a bug introduced in pypdf4. |
This bug is still valid ? |
Any updates on this? |
In [1]: from PyPDF4 import PdfFileReader
In [2]: test_pdf = PdfFileReader(open('test.pdf', 'rb'))
In [3]: One_pdf = PdfFileReader(open('1.pdf', 'rb'))
In [4]: one_pdf = PdfFileReader(open('1.pdf', 'rb'))
In [5]: numbers = PdfFileReader(open('101880043.pdf', 'rb'))
---------------------------------------------------------------------------
PdfReadError Traceback (most recent call last)
<ipython-input-5-dfaa3ab77f43> in <module>
----> 1 numbers = PdfFileReader(open('101880043.pdf', 'rb'))
~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in __init__(self, stream, strict, warndest, overwriteWarnings)
1146 stream = BytesIO(b_(fileobj.read()))
1147 fileobj.close()
-> 1148 self.read(stream)
1149 self.stream = stream
1150
~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in read(self, stream)
1964 continue
1965 # no xref table found at specified location
-> 1966 raise utils.PdfReadError("Could not find xref table at specified location")
1967 #if not zero-indexed, verify that the table is correct; change it if necessary
1968 if self.xrefIndex and not self.strict:
PdfReadError: Could not find xref table at specified location
In [6]: test_pdf.getNumPages()
Out[6]: 1
In [7]: one_pdf.getNumPages()
Out[7]: 1 A got an error using @jucajuca PDF sample, Using ...
You may post some code samples ... |
The PDF file causing error is attached. This one-page file is extracted from a PDF using Acrobat.
1.pdf
When it's opened with
PdfFileReader
and callsnumPages
, the script crashes with an exception:But if I decompress this file first using qpdf, there's no error. Seems some sort of rare structure in the low-level.
The text was updated successfully, but these errors were encountered: