Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfFileReader crashes on a normal PDF file #63

Open
askerlee opened this issue Sep 19, 2019 · 5 comments
Open

PdfFileReader crashes on a normal PDF file #63

askerlee opened this issue Sep 19, 2019 · 5 comments

Comments

@askerlee
Copy link

askerlee commented Sep 19, 2019

The PDF file causing error is attached. This one-page file is extracted from a PDF using Acrobat.
1.pdf

When it's opened with PdfFileReader and calls numPages, the script crashes with an exception:

pypdf.utils.PdfReadError: Cannot fetch a free object (id, next gen.) = (8, 0)

But if I decompress this file first using qpdf, there's no error. Seems some sort of rare structure in the low-level.

@askerlee
Copy link
Author

Met another file with the same error 😢 Seems not so rare.
test.pdf

@askerlee
Copy link
Author

askerlee commented Sep 23, 2019

Update: these files can be opened normally with pypdf2. Seems it's a bug introduced in pypdf4.

@cadu-leite
Copy link

This bug is still valid ?
cause I did merge those files with another

@jucajuca
Copy link

jucajuca commented Mar 7, 2021

Any updates on this?
I am also having troubles reading the following file:

101880043.pdf

@cadu-leite
Copy link

In [1]: from PyPDF4 import PdfFileReader

In [2]: test_pdf = PdfFileReader(open('test.pdf', 'rb'))

In [3]: One_pdf = PdfFileReader(open('1.pdf', 'rb'))

In [4]: one_pdf = PdfFileReader(open('1.pdf', 'rb'))

In [5]: numbers = PdfFileReader(open('101880043.pdf', 'rb'))
---------------------------------------------------------------------------
PdfReadError                              Traceback (most recent call last)
<ipython-input-5-dfaa3ab77f43> in <module>
----> 1 numbers = PdfFileReader(open('101880043.pdf', 'rb'))

~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in __init__(self, stream, strict, warndest, overwriteWarnings)
   1146             stream = BytesIO(b_(fileobj.read()))
   1147             fileobj.close()
-> 1148         self.read(stream)
   1149         self.stream = stream
   1150

~/.virtualenvs/pypdf4_issue_63/lib/python3.9/site-packages/PyPDF4/pdf.py in read(self, stream)
   1964                     continue
   1965                 # no xref table found at specified location
-> 1966                 raise utils.PdfReadError("Could not find xref table at specified location")
   1967         #if not zero-indexed, verify that the table is correct; change it if necessary
   1968         if self.xrefIndex and not self.strict:

PdfReadError: Could not find xref table at specified location

In [6]: test_pdf.getNumPages()
Out[6]: 1

In [7]: one_pdf.getNumPages()
Out[7]: 1

A got an error using @jucajuca PDF sample,
but the others seems to work .

Using ...

  • Python 3.9
  • PyPDF4==1.27.0

You may post some code samples ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants