Skip to content
This repository has been archived by the owner on Nov 8, 2024. It is now read-only.

Missing page #63

Open
thomassajot opened this issue Sep 13, 2020 · 3 comments
Open

Missing page #63

thomassajot opened this issue Sep 13, 2020 · 3 comments

Comments

@thomassajot
Copy link

thomassajot commented Sep 13, 2020

Hello,
Surprisingly, some pages are missing when using pdf_annotate:
Example pdf from https://www.hkexgroup.com/-/media/HKEX-Group-Site/ssd/Investor-Relations/Regulatory-Reports/documents/2016/160321ar_e.pdf?la=en , with 212 pages.

when running the following code, the new files is missing 2 pages. The second and previous to last pages. Any idea why ?

from pdf_annotate import PdfAnnotator
pdf_file = 'file_path_to.pdf'
copy_file = 'copy_file_path_to.pdf'
annotator = PdfAnnotator(pdf_file)
annotator.write(copy_file)
@mjbryant
Copy link
Contributor

This is likely due to pdfrw, the underlying library that pdf_annotate uses to read, edit, and write PDFs. You could try reading in and writing back out that file using just pdfrw and see if the pages are missing.

@jerrian
Copy link

jerrian commented Jul 16, 2021

I also found the similar problem and it comes from PdfReader as below. (Actually test.pdf has 19 pages)

>>> from pdfrw import PdfReader
>>> from PyPDF2 import PdfFileReader
>>> filename = './test.pdf'
>>> pdf_reader = PdfReader(filename)
>>> len(pdf_reader.pages)
2
>>> pdf_file_reader = PdfFileReader(open(filename, 'rb'))
>>> pdf_file_reader.getNumPages()
19
>>> from PyPDF3 import PdfFileReader
>>> pdf_file_reader = PdfFileReader(open(filename, 'rb'))
>>> pdf_file_reader.getNumPages()
19

I raised this issue on that repo and I'm still waiting for their answer, but I'm wondering if I can get an answer because there have been no changes since 2018.
Can't use preexisting streams like pyPdf while initializing PdfReader

Could you allow or change PdfAnnotator to use PdfFileReader and PdfFileWriter from PyPDF3, which is a fork of PyPDF2 and is still actively improved?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@mjbryant @jerrian @thomassajot and others