Invalid Literal for int() ... for a PDF download from GoogleDocs SPreadSheet #86

cadu-leite · 2020-07-27T22:25:11Z

The PDF file is attached
pdf_sample_googlesheet_pages_02.pdf

traceback:

  File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1599, in getObject
    idnum, generation = self.readObjectHeader(self.stream)
  File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1667, in readObjectHeader
    return int(idnum), int(generation)
ValueError: invalid literal for int() with base 10: b'F-1.4'

ValueError: invalid literal for int() with base 10: b'F-1.4'

A code with test incuded can be seen at this repo (merge2pdf)

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2020-08-05T09:10:28Z

Hi Cadu-Leite,
Your PDF has some free objects (outlines,JS) that are referenced. I've introduced a fixed in my pre-released (https://github.com/pubpub-zz/PyPDF4/releases/tag/1.27.0ppZZ). Please note that this version has been deeply rewritten. I've normally kept backward compatibility. I've also starting to upgrade documemtation.Can you tell me if it is ok for you?

cadu-leite · 2020-08-09T23:00:19Z

https://github.com/pubpub-zz/PyPDF4/releases/tag/1.27.0ppZZ

I Believe you change the namespace from PyPDF4 to pypdf ... it has to be in BIG LETTERS on docs.

continue...

Tha error has changed, but still on the same PDF file, a google sheet exported to PDF.

traceback - trying to red a PDF from Google Sheet.

    Traceback (most recent call last):
      File "/Users/cadu/projs/merge_pdfs/tests/test_merge2pdf.py", line 66, in test_merge_pdf_output
        m.merge_pdfs()
      File "/Users/cadu/projs/merge_pdfs/merge2pdf.py", line 89, in merge_pdfs
        merged_pdf.append(fileobj = file_name)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/merger.py", line 146, in append
        self.merge(None, fileobj, bookmark, numpages, import_bookmarks)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/merger.py", line 116, in merge
        self._copy_bookmarks(fileobj.root_object["/Outlines"], bkmark, srcpages)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/generic.py", line 430, in __getitem__
        return dict.__getitem__(self, key).getObject()
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/generic.py", line 214, in getObject
        return self.pdf.getObject(self).getObject()
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/pdfreader.py", line 488, in get_object
        retval = self._get_object_by_ref(ref, self.R_XTABLE)
      File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/pdfreader.py", line 284, in _get_object_by_ref
        raise PdfReadError("Cannot fetch a free object (id, next gen.) = (%d, %d)"
    pypdf.utils.PdfReadError: Cannot fetch a free object (id, next gen.) = (2, 0)

.. then , eliminating the the Google Sheet PDF , taking it off the PDF list to be merged,
I got another error.

Its seems you change the keyword parameters ... that not nice. It will break a lot of scripts , and you have a pythonic way to do that, you may accept both or dont change it at all.

    Traceback (most recent call last):
      File "/Users/cadu/projs/merge_pdfs/tests/test_merge2pdf.py", line 66, in test_merge_pdf_output
        m.merge_pdfs()
      File "/Users/cadu/projs/merge_pdfs/merge2pdf.py", line 87, in merge_pdfs
        merged_pdf.append(fileobj = file_name, pages = page_range)
    TypeError: append() got an unexpected keyword argument 'pages'

but ok, I changed the parameter name to numpages its work .
But the problem with PDFs that comes from Google Sheet remains.

The rest seems to be ok .

cadu-leite · 2020-08-09T23:01:37Z

please let know if a can help in anything else

pubpub-zz · 2020-08-10T17:32:34Z

Yes !
thanks for the test and report. I had a look:
About PyPDF4 renamed into pypdf, it is a choice from claird (don't know why)
First for the issue with google sheet PDF, I forgot to tell you to set strict to false in merger init in order to make merger tolerant to 'erroneous' file:
merged_pdf = PdfFileMerger(strict=False)
also I've found that the API broken you've raised : I've fixed it
finally I've found an issue when a NullObject is returned for outlines. fixed also.
I've run successfully your test
find the update of my library. changes have been committed but I would like for a few for beta tester before tagging it.

pypdf4-1.27.0PPzz_1-py2.py3-none-any.whl.zip

      Thanks for your returns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid Literal for int() ... for a PDF download from GoogleDocs SPreadSheet #86

Invalid Literal for int() ... for a PDF download from GoogleDocs SPreadSheet #86

cadu-leite commented Jul 27, 2020

pubpub-zz commented Aug 5, 2020

cadu-leite commented Aug 9, 2020 •

edited

Loading

cadu-leite commented Aug 9, 2020

pubpub-zz commented Aug 10, 2020

Invalid Literal for int() ... for a PDF download from GoogleDocs SPreadSheet #86

Invalid Literal for int() ... for a PDF download from GoogleDocs SPreadSheet #86

Comments

cadu-leite commented Jul 27, 2020

pubpub-zz commented Aug 5, 2020

cadu-leite commented Aug 9, 2020 • edited Loading

cadu-leite commented Aug 9, 2020

pubpub-zz commented Aug 10, 2020

cadu-leite commented Aug 9, 2020 •

edited

Loading