-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdfrw vs pypdf page extraction & merge #7
Comments
Thank you for sharing! I just updated my |
I've just noticed that you should definitely apply compression when merging stuff with pypdf. Look at the diff of the README: a78f609 |
EDIT: sorry, I misunderstood (thought you were asking about pypdf) I asked this question and then used this installation: So my current output says it's sarnold's pdfrw version 0.5.0 Anyway, differences vs pypdf were pretty much the same. |
Thank you 🙏 I just added pdfrw to my comparision with watermarking speed/file size: commit / rendered output It's crazy. I did expect that it's faster, but not that much. And I didn't expect a difference in file size. pdfrw did an awesome job there. Thank you for pointing that out! |
What about my pypdf code implementation? Do you see something wrong?
I edited my above code
if compress: # Compress the data
for page in merger.pages:
page.compress_content_streams() # This is CPU intensive! Do you see something else to change on my pypdf usage? Change 1 reduced total
Change 2 didn't work as expected:
So I'd say there must be something wrong either in my pypdf code or in the pypdf compression algorithm. Indeed, total times vary a little bit between identical script runs |
I've just noticed that you used the same files as I did in my benchmark. Nice! |
Test run on Python 3.8, Windows 7:
I recall my initial code also deleted original bookmarks/annotations from pdfs, but I removed that part for simplicity and commented where I had read about that.
Code:
OUTPUT:
The text was updated successfully, but these errors were encountered: