Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters ti replaced by ( in pdf #508

Open
martinratinaud opened this issue Aug 8, 2022 · 1 comment
Open

Characters ti replaced by ( in pdf #508

martinratinaud opened this issue Aug 8, 2022 · 1 comment

Comments

@martinratinaud
Copy link

Bug Report 🐛

Transform of this specific pdf file https://assets.website-files.com/615dba2b324d4ea51a398f26/622a2175014d39da4f4bf688_2022%2003%2014%20CGU%20Heetch%20France%20CLEAN.pdf leads to weird text transformation.

Steps to Reproduce

Launch

wget https://assets.website-files.com/615dba2b324d4ea51a398f26/622a2175014d39da4f4bf688_2022%2003%2014%20CGU%20Heetch%20France%20CLEAN.pdf heetch.pdf -O heetch.pdf
markus transform --input heetch.pdf --from pdf --to markdown | head -n 15

Current Behavior

1\. Objet



L’applica(on « Heetch » propose un service (ci-après l’ « Applica(on ») des(né à perme?re la

mise en rela(on de personnes recherchant un moyen de transport vers une des(na(on

donnée (ci-après : les « Passagers ») avec un exploitant de voitures de transport avec

chauffeur ou une entreprise inscrite au registre départemental des transports (ci-après : les

« Chauffeurs »).

It seems ti is replaced by (

Expected behaviour

Get as close to possible to

1. Objet
L’application « Heetch » propose un service (ci-après l’ « Application ») destiné à permettre la
mise en relation de personnes recherchant un moyen de transport vers une destination
donnée (ci-après : les « Passagers ») avec un exploitant de voitures de transport avec
chauffeur ou une entreprise inscrite au registre départemental des transports (ci-après : les
« Chauffeurs »). 

Context (Environment)

Desktop

  • OS: MacOS
  • Browser: command line
  • Version: markus 0.15.2
@MattiSG
Copy link

MattiSG commented Aug 10, 2022

This is because the ti is presented as a ligature in this document. However, this ligature is not part of Unicode’s Alphabetic Presentation Forms block (range U+FB0x for latin scripts), and had been encoded in the document as a ( character: I can reproduce markus' behaviour by copy-pasting.
The publishing software of that document is Apple Pages.

Is there any way for markus to recover such an encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants