Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

翻译后的 PDF 文本覆盖原文(高质量扫描) #235

Closed
eliasjudin opened this issue Dec 15, 2024 · 4 comments
Closed

翻译后的 PDF 文本覆盖原文(高质量扫描) #235

eliasjudin opened this issue Dec 15, 2024 · 4 comments

Comments

@eliasjudin
Copy link

我尝试使用本项目对一个高质量扫描的 PDF 进行翻译,但翻译结果中的文本覆盖在原文之上,导致无法正常阅读。可能是由于该 PDF 没有文本组件,仅包含扫描的图像。

附上原始 PDF 文件和翻译后的 PDF 文件以供参考。

original.pdf

pdf2zh.pdf

pdf2zh original.pdf -li ru -lo en -p
 2 -f "(CM[^R]|(MS|XY|MT|BL|RM|EU|LA|RS)[A-Z]|LINE|LCIRCLE|TeX-|rsfs|
txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)"
@Byaidu
Copy link
Owner

Byaidu commented Dec 15, 2024

#19

@Byaidu Byaidu closed this as completed Dec 15, 2024
@hellofinch
Copy link
Contributor

scanned file not support well. #19

@eliasjudin
Copy link
Author

scanned file not support well. #19

so the file requires a text layer? if i ocr to add the text layer will it work?

@hellofinch
Copy link
Contributor

I'm not sure. The texts were extracted from PDF, not from the picture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants