Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dynamic line space #431

Merged
merged 2 commits into from
Jan 8, 2025
Merged

Conversation

timelic
Copy link
Contributor

@timelic timelic commented Jan 8, 2025

支持在行距过大超出原有段落时,缩小行距。

截屏2025-01-08 17 08 47

日本中段落重叠问题尤其严重,因为日语本身较长和片假名的存在导致段落过长。如图在右方段落进行了行距缩小。

截屏2025-01-08 20 45 59

@timelic
Copy link
Contributor Author

timelic commented Jan 8, 2025

#297 的延伸。

当某个段落有许多LaTeX公式时候,翻译后的内容很容易超出原有段落。

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

If you are not in a hurry, you can wait for my new backend. My new backend will completely decouple the parsing and typesetting steps, and use IL representation in XML format. Making these improvements on the new backend should be much easier.
Here's my implementation of the code for #297 (comment)

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

I personally do not recommend dynamically reducing line spacing, as I believe it significantly impacts the visual experience. However, I am quite interested in dynamically expanding line spacing, hahaha.

@timelic
Copy link
Contributor Author

timelic commented Jan 8, 2025

@awwaawwa 动态行距只是在默认值上缩小行距,避免和下方的文本块重叠。它不会动态地增加行距去覆盖原文本块。

我看到#297 (comment) 这里确实很不错,但是我担心会造成PDF内正文中字号的不断变化。

排版设计上,往往依靠字体大小来突出某段文本。如果某个段落字号超出其余段落,很容易被误读成作者在该处进行了强调。

还是说,#297 (comment) 其实是统计整页/整个文档,从而寻找到最佳正文字号的嘛?如果是这样那会很棒。

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

@timelic 目前只是为了适应段落而动态调整字号。后续会尝试统计全文,以确定最佳字号。当前的XML IL提供的信息支持这一功能,只是我还没有时间处理。

我认为行距不宜过小,否则会影响阅读体验。我宁愿字号小一些。不过,适度的动态调整是可以接受的,我后续会抽时间在新后端中实现这一功能。

@timelic
Copy link
Contributor Author

timelic commented Jan 8, 2025

@awwaawwa 适当缩小字号是很可行的,但感觉波动不应该太大。如果字号和行距都可以调整,让观感更自然也挺不错。

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

@timelic +1

@timelic
Copy link
Contributor Author

timelic commented Jan 8, 2025

@timelic +1

其实最好还是能自适应地调节段落位置。比如某一段超出原有位置,但是下面段落的下方有一些空白,那么就可以让下面段落往下移一点儿,让两个段落不重叠。这样行距、字号都不用调整,页面观感也很漂亮。

但是感觉似乎很难办,不知道有没有对应的算法(均匀分布)。

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

@timelic 这个算是小幅度PDF重排了吧,慢慢来吧hhhhhh。

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

@timelic 感觉可以参考参考 https://github.com/koreader/koreader 的实现。毕竟在小尺寸墨水屏上重排PDF难度可比现在这个大多了。

@timelic
Copy link
Contributor Author

timelic commented Jan 8, 2025

@awwaawwa 先等新的后端合入吧,可能简单的贪心/弹性布局就能带来不错的效果。我想的还是页内重排,甚至不要跨双栏,这样实现就会简单一些,同时不会失去PDF双语对照的优点。

顺便问一句新后端的开发进度(

@Byaidu Byaidu merged commit f30133e into Byaidu:main Jan 8, 2025
1 of 2 checks passed
@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

@timelic 大功能上,目前主要缺少对公式混合排版的支持,还需要测试一些常见的PDF文件并修复相关bug。具体细节上我还不太清楚,因为这次基本上是完全重写了,很多细节都不同。预计会有一段新旧后端同时支持的过渡期。

另外,公式混合排版使用的是占位符方法,这与富文本翻译的方法类似,所以我打算看看是否能够一并支持。不过目前我只解析出了文字颜色的信息。

@timelic
Copy link
Contributor Author

timelic commented Jan 8, 2025

@awwaawwa 这很棒!做排版确实功德无量!

希望能顺便支持加粗文本的解析,也许可以从字体的名字中判断,比如带bold heavy 700 800

@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 8, 2025

@timelic 加粗得等等了。我后面再去读读PDF规范,看看文本运算符中有没有对应的标准操作。没有的话,再去研究研究字体。除了加粗我还想看看斜体(意大利体),然后翻译中文的话对应换成楷体。我记得字体元数据里也有写他到底是什么体,不过目前还没摸进IL里。后面弄吧hhh

hellofinch pushed a commit to hellofinch/PDFMathTranslate that referenced this pull request Jan 15, 2025
@awwaawwa
Copy link
Collaborator

awwaawwa commented Jan 17, 2025

@timelic 加粗&斜体&富文本翻译已实现。ps当前仅考虑英文翻译成中文,其他情况暂不考虑。

CleanShot 2025-01-18 at 00 37 11@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants