Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the code #2

Open
J-F-Liu opened this issue Dec 29, 2016 · 4 comments
Open

Refactor the code #2

J-F-Liu opened this issue Dec 29, 2016 · 4 comments

Comments

@J-F-Liu
Copy link

J-F-Liu commented Dec 29, 2016

I feel the current code is difficult to extend for adding new features.
Suggest to use lopdf to do PDF object serialization.
Here is some example code:

extern crate lopdf;
use lopdf::{Document, Object, Dictionary, Stream, StringFormat};
use Object::{Null, Integer, Name, String, Reference};

let mut doc = Document::new();
doc.version = "1.5".to_string();
doc.add_object(Null);
doc.add_object(true);
doc.add_object(3);
doc.add_object(0.5);
doc.add_object(String("text".as_bytes().to_vec(), StringFormat::Literal));
doc.add_object(Name("name".to_string()));
doc.add_object(Reference((1,0)));
doc.add_object(vec![Integer(1), Integer(2), Integer(3)]);
doc.add_object(Stream::new(Dictionary::new(), vec![0x41, 0x42, 0x43]));
let mut dict = Dictionary::new();
dict.set("A", Null);
dict.set("B", false);
dict.set("C", Name("name".to_string()));
doc.add_object(dict);
doc.save("test.pdf").unwrap();
@kaj
Copy link
Owner

kaj commented Dec 29, 2016

One of the goals for rust-pdf is to be able to create large documents in a small memory footprint, by writing each object to the file as soon as possible (and by serializing dictionarys immediatley). Lopdf seems to take the opposite aproach, keeping everything in memory as high-level objects until finallay serializing the entire document.

@J-F-Liu
Copy link
Author

J-F-Liu commented Dec 30, 2016

Normally a PDF document won't be very large, ranging form tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keep the whole document in memory, stream length can be pre-calculated, no need to use a reference object for the Length entry, the resulting PDF file is smaller for distribution and faster for PDF consumers to process.

Producing is a one-time effort, while consuming is many more.

@saethlin
Copy link

Just out of curiosity I cloned the repo and changed all the writes to an internal String buffer, and added one last write at the very end to dump the buffer to the open file. I've disabled fonts for the moment, it looks like those might not be so easy.

I'm not really sure what qualifies as a large PDF document, but the circle output is about 5 kB so call that a small PDF. The original implementation runs in about 2 ms, and writing to an internal buffer brings that to 0.86 ms.

If I take the mandala example and pass 100 on the command line it spits out an 832 kB document, which is about the same size as some papers on arXiv. The original implementation runs in about 250 ms and writing to an internal buffer finishes in 11 ms. I'm not even sure this qualifies as a large pdf, being that it's only twice as large as the binary these compile to.

I quite like the API you've started building here, but I'm not comfortable with the constant writing to disk. Are you still committed to minimizing memory footprint?

@kaj
Copy link
Owner

kaj commented May 8, 2019

Constant writing is not the same as constant writing to disk. In some cases, the writing is to an internal buffer (e.g. a Vec<u8>) anyway, and when writing to an acutal file on disk that should probably be done through a std::io::BufWriter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants