Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

通过字体排除翻译的段落格式丢失 #121

Open
OceanApart opened this issue Nov 24, 2024 · 1 comment
Open

通过字体排除翻译的段落格式丢失 #121

OceanApart opened this issue Nov 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@OceanApart
Copy link

问题描述

image

代码内容通过字体排除后没有被翻译但是段落信息丢失了。希望能保留段落多行结构。

测试文档

Important

请提供用于复现测试的 PDF 文档

pdf2zh "/tmp/tmpm_l2jpr3/input.pdf" -lo zh -s deeplx -f "(NCO.*|NCN.*)" -p 1

Fundamentals of Computer Graphics, Fourth Edition ( PDFDrive ) 第619 - 629页 第 1 页.pdf
translated_Fundamentals of Computer Graphics Fourth Edition PDFDrive 第619 - 629页 第 1 页.pdf

@Byaidu
Copy link
Owner

Byaidu commented Nov 24, 2024

之后会加一个识别代码段的模型来解决这个问题,参考 #37

@Byaidu Byaidu added the bug Something isn't working label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants