Refactor the code #2

J-F-Liu · 2016-12-29T05:46:49Z

I feel the current code is difficult to extend for adding new features.
Suggest to use lopdf to do PDF object serialization.
Here is some example code:

extern crate lopdf;
use lopdf::{Document, Object, Dictionary, Stream, StringFormat};
use Object::{Null, Integer, Name, String, Reference};

let mut doc = Document::new();
doc.version = "1.5".to_string();
doc.add_object(Null);
doc.add_object(true);
doc.add_object(3);
doc.add_object(0.5);
doc.add_object(String("text".as_bytes().to_vec(), StringFormat::Literal));
doc.add_object(Name("name".to_string()));
doc.add_object(Reference((1,0)));
doc.add_object(vec![Integer(1), Integer(2), Integer(3)]);
doc.add_object(Stream::new(Dictionary::new(), vec![0x41, 0x42, 0x43]));
let mut dict = Dictionary::new();
dict.set("A", Null);
dict.set("B", false);
dict.set("C", Name("name".to_string()));
doc.add_object(dict);
doc.save("test.pdf").unwrap();

kaj · 2016-12-29T15:31:44Z

One of the goals for rust-pdf is to be able to create large documents in a small memory footprint, by writing each object to the file as soon as possible (and by serializing dictionarys immediatley). Lopdf seems to take the opposite aproach, keeping everything in memory as high-level objects until finallay serializing the entire document.

J-F-Liu · 2016-12-30T06:02:40Z

Normally a PDF document won't be very large, ranging form tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keep the whole document in memory, stream length can be pre-calculated, no need to use a reference object for the Length entry, the resulting PDF file is smaller for distribution and faster for PDF consumers to process.

Producing is a one-time effort, while consuming is many more.

saethlin · 2017-08-10T05:00:38Z

Just out of curiosity I cloned the repo and changed all the writes to an internal String buffer, and added one last write at the very end to dump the buffer to the open file. I've disabled fonts for the moment, it looks like those might not be so easy.

I'm not really sure what qualifies as a large PDF document, but the circle output is about 5 kB so call that a small PDF. The original implementation runs in about 2 ms, and writing to an internal buffer brings that to 0.86 ms.

If I take the mandala example and pass 100 on the command line it spits out an 832 kB document, which is about the same size as some papers on arXiv. The original implementation runs in about 250 ms and writing to an internal buffer finishes in 11 ms. I'm not even sure this qualifies as a large pdf, being that it's only twice as large as the binary these compile to.

I quite like the API you've started building here, but I'm not comfortable with the constant writing to disk. Are you still committed to minimizing memory footprint?

kaj · 2019-05-08T15:59:17Z

Constant writing is not the same as constant writing to disk. In some cases, the writing is to an internal buffer (e.g. a Vec<u8>) anyway, and when writing to an acutal file on disk that should probably be done through a std::io::BufWriter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the code #2

Refactor the code #2

J-F-Liu commented Dec 29, 2016 •

edited

Loading

kaj commented Dec 29, 2016

J-F-Liu commented Dec 30, 2016 •

edited

Loading

saethlin commented Aug 10, 2017

kaj commented May 8, 2019

Refactor the code #2

Refactor the code #2

Comments

J-F-Liu commented Dec 29, 2016 • edited Loading

kaj commented Dec 29, 2016

J-F-Liu commented Dec 30, 2016 • edited Loading

saethlin commented Aug 10, 2017

kaj commented May 8, 2019

J-F-Liu commented Dec 29, 2016 •

edited

Loading

J-F-Liu commented Dec 30, 2016 •

edited

Loading