Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple improvements across multiple forks #49

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
f652dbf
Cargo.toml: customize crate metadata
badicsalex Feb 27, 2022
2175aaf
Cargo.toml: upgrade to 2021 edition
badicsalex Feb 27, 2022
eb6c393
Use log::{warn, info} instead of printlns
badicsalex Feb 27, 2022
071dd6b
Project: run cargo fmt
fisherdarling Feb 20, 2022
3d0d448
do not panic on unknown char_decode
fisherdarling Feb 20, 2022
1d05148
lib.rs: Add allow(dead_code) for some currently unused struct fields
badicsalex Feb 27, 2022
433f31a
lib.rs: Fixed clippy issues
badicsalex Feb 27, 2022
053587f
Cargo.toml: use num_parser with lopdf
badicsalex Feb 27, 2022
400ed4b
glyphnames: use the phf library instead of binary search
badicsalex Feb 27, 2022
6f65918
lib.rs: republish lopdf::Document
badicsalex Feb 27, 2022
3e415a6
lib.rs: Publish PdfFont, and give it to OutputDev
badicsalex Feb 27, 2022
7f260b3
PdfFont: don't panic! when get_width is called on an unknown code
badicsalex Feb 27, 2022
11ec1b4
PdfFont: export get_basefont
badicsalex Feb 28, 2022
da89f72
PdfSimpleFont.new: mark unicode mismatch as dlog
badicsalex Mar 14, 2022
c2785e5
make_colorspace: patch over the Separation-type colorspace handling
badicsalex Aug 20, 2022
3c4509d
make_colorspace: don't panic on unknown color spaces
badicsalex Sep 19, 2022
a0f934c
examples: fix crate name in extract example
badicsalex Sep 19, 2022
eda8dce
Fix colorspace in SC command handling.
badicsalex Sep 19, 2022
8e1eeae
project: remove github CI integration
badicsalex Sep 19, 2022
24043ab
Pass CID to OutputDev.show_text implementers
badicsalex Sep 19, 2022
5d40745
lib: don't panic on unknown smask entries
badicsalex Sep 26, 2022
00ea73f
fix(lib): apply patch for smask
QPixel Oct 5, 2022
2dd7ecf
Merge https://github.com/badicsalex/pdf-extract-fhl
QPixel Oct 5, 2022
cb9723b
chore: update package name and version
QPixel Oct 5, 2022
88f51d5
fix: bump lopdf version
QPixel Oct 5, 2022
8634a7b
fix build errros
Hessesian Dec 1, 2022
16e2a0e
add newlines to plain text
Hessesian Dec 1, 2022
1077a88
add loading file from other sources (mem, read)
Hessesian Dec 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 0 additions & 22 deletions .github/workflows/rust.yml

This file was deleted.

12 changes: 7 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
[package]
authors = ["Jeff Muizelaar <[email protected]>"]
name = "pdf-extract"
version = "0.6.5-alpha.0"
version = "0.6.5-alpha.3"
license = "MIT"
documentation = "https://docs.rs/crate/pdf-extract/"
description = "A library to extract content from pdfs"
description = "A library to extract content from pdfs (patched version for hun-law) (patched again for field)"
keywords = ["pdf2text", "text", "pdf", "pdf2txt"]
repository = "https://github.com/jrmuizel/pdf-extract"
edition = "2018"
repository = "https://github.com/qpixel/pdf-extract"
edition = "2021"

[profile.release]
debug = true
Expand All @@ -16,8 +16,10 @@ debug = true
adobe-cmap-parser = "0.3.3"
encoding = "0.2.33"
euclid = "0.20.5"
lopdf = { version = "0.26", default-features = false, features = [ "pom_parser" ] }
linked-hash-map = "=0.5.3"
log = "0.4.14"
lopdf = { git = "https://github.com/J-F-Liu/lopdf.git", default-features = false, features = [ "nom_parser" ] }
phf = { version = "0.10", features = ["macros"] }
postscript = "0.14"
type1-encoding-parser = "0.1.0"
unicode-normalization = "0.1.19"
18 changes: 10 additions & 8 deletions examples/extract.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
extern crate pdf_extract;
extern crate lopdf;

use lopdf::*;
use pdf_extract::*;
use std::env;
use std::path::PathBuf;
use std::path;
use std::io::BufWriter;
use std::fs::File;
use pdf_extract::*;
use lopdf::*;
use std::io::BufWriter;
use std::path;
use std::path::PathBuf;

fn main() {
//let output_kind = "html";
Expand All @@ -21,13 +20,16 @@ fn main() {
let mut output_file = PathBuf::new();
output_file.push(filename);
output_file.set_extension(&output_kind);
let mut output_file = BufWriter::new(File::create(output_file).expect("could not create output"));
let mut output_file =
BufWriter::new(File::create(output_file).expect("could not create output"));
let doc = Document::load(path).unwrap();

print_metadata(&doc);

let mut output: Box<dyn OutputDev> = match output_kind.as_ref() {
"txt" => Box::new(PlainTextOutput::new(&mut output_file as &mut dyn std::io::Write)),
"txt" => Box::new(PlainTextOutput::new(
&mut output_file as &mut dyn std::io::Write,
)),
"html" => Box::new(HTMLOutput::new(&mut output_file)),
"svg" => Box::new(SVGOutput::new(&mut output_file)),
_ => panic!(),
Expand Down
4,283 changes: 4,266 additions & 17 deletions src/core_fonts.rs

Large diffs are not rendered by default.

Loading