Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Literal for int() ... for a PDF download from GoogleDocs SPreadSheet #86

Open
cadu-leite opened this issue Jul 27, 2020 · 4 comments

Comments

@cadu-leite
Copy link

The PDF file is attached
pdf_sample_googlesheet_pages_02.pdf

traceback:

  File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1599, in getObject
    idnum, generation = self.readObjectHeader(self.stream)
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1667, in readObjectHeader
    return int(idnum), int(generation)
ValueError: invalid literal for int() with base 10: b'F-1.4'

ValueError: invalid literal for int() with base 10: b'F-1.4'

A code with test incuded can be seen at this repo (merge2pdf)

@pubpub-zz
Copy link

Hi Cadu-Leite,
Your PDF has some free objects (outlines,JS) that are referenced. I've introduced a fixed in my pre-released (https://github.com/pubpub-zz/PyPDF4/releases/tag/1.27.0ppZZ). Please note that this version has been deeply rewritten. I've normally kept backward compatibility. I've also starting to upgrade documemtation.Can you tell me if it is ok for you?

@cadu-leite
Copy link
Author

cadu-leite commented Aug 9, 2020

https://github.com/pubpub-zz/PyPDF4/releases/tag/1.27.0ppZZ

I Believe you change the namespace from PyPDF4 to pypdf ... it has to be in BIG LETTERS on docs.

continue...

Tha error has changed, but still on the same PDF file, a google sheet exported to PDF.

traceback - trying to red a PDF from Google Sheet.

    Traceback (most recent call last):
      File "/Users/cadu/projs/merge_pdfs/tests/test_merge2pdf.py", line 66, in test_merge_pdf_output
        m.merge_pdfs()
      File "/Users/cadu/projs/merge_pdfs/merge2pdf.py", line 89, in merge_pdfs
        merged_pdf.append(fileobj = file_name)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/merger.py", line 146, in append
        self.merge(None, fileobj, bookmark, numpages, import_bookmarks)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/merger.py", line 116, in merge
        self._copy_bookmarks(fileobj.root_object["/Outlines"], bkmark, srcpages)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/generic.py", line 430, in __getitem__
        return dict.__getitem__(self, key).getObject()
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/generic.py", line 214, in getObject
        return self.pdf.getObject(self).getObject()
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/pdfreader.py", line 488, in get_object
        retval = self._get_object_by_ref(ref, self.R_XTABLE)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/pdfreader.py", line 284, in _get_object_by_ref
        raise PdfReadError("Cannot fetch a free object (id, next gen.) = (%d, %d)"
    pypdf.utils.PdfReadError: Cannot fetch a free object (id, next gen.) = (2, 0)

.. then , eliminating the the Google Sheet PDF , taking it off the PDF list to be merged,
I got another error.

Its seems you change the keyword parameters ... that not nice. It will break a lot of scripts , and you have a pythonic way to do that, you may accept both or dont change it at all.

    Traceback (most recent call last):
      File "/Users/cadu/projs/merge_pdfs/tests/test_merge2pdf.py", line 66, in test_merge_pdf_output
        m.merge_pdfs()
      File "/Users/cadu/projs/merge_pdfs/merge2pdf.py", line 87, in merge_pdfs
        merged_pdf.append(fileobj = file_name, pages = page_range)
    TypeError: append() got an unexpected keyword argument 'pages'

but ok, I changed the parameter name to numpages its work .
But the problem with PDFs that comes from Google Sheet remains.

The rest seems to be ok .

@cadu-leite
Copy link
Author

please let know if a can help in anything else

@pubpub-zz
Copy link

Yes !
thanks for the test and report. I had a look:
About PyPDF4 renamed into pypdf, it is a choice from claird (don't know why)
First for the issue with google sheet PDF, I forgot to tell you to set strict to false in merger init in order to make merger tolerant to 'erroneous' file:
merged_pdf = PdfFileMerger(strict=False)
also I've found that the API broken you've raised : I've fixed it
finally I've found an issue when a NullObject is returned for outlines. fixed also.
I've run successfully your test
find the update of my library. changes have been committed but I would like for a few for beta tester before tagging it.

pypdf4-1.27.0PPzz_1-py2.py3-none-any.whl.zip

      Thanks for your returns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants