Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make use of info.cluster #197

Open
replabrobin opened this issue Jun 10, 2024 · 3 comments
Open

make use of info.cluster #197

replabrobin opened this issue Jun 10, 2024 · 3 comments

Comments

@replabrobin
Copy link

replabrobin commented Jun 10, 2024

I wanted to set the buffer info cluster value before shaping so I could use the returned cluster numbers as a guide to the input colours etc etc. I had to add a setter to make this possible

diff --git a/src/uharfbuzz/_harfbuzz.pyx b/src/uharfbuzz/_harfbuzz.pyx
index 5adf637..ead947e 100644
--- a/src/uharfbuzz/_harfbuzz.pyx
+++ b/src/uharfbuzz/_harfbuzz.pyx
@@ -69,6 +69,10 @@ cdef class GlyphInfo:
     def cluster(self) -> int:
         return self._hb_glyph_info.cluster
 
+    @cluster.setter
+    def cluster(self,v) -> None:
+        self._hb_glyph_info.cluster = v
+
     @property
     def flags(self) -> GlyphFlags:
         return GlyphFlags(self._hb_glyph_info.mask & HB_GLYPH_FLAG_DEFINED)

but although I can set the cluster values prior to shaping the returned clusters are all zero

so this code

#!/bin/env python
import uharfbuzz as hb

if False:
	import sys
	fontfile = sys.argv[1]
	text = sys.argv[2]
else:
	fontfile = '/home/robin/devel/reportlab/REPOS/reportlab/tmp/NotoSansKhmer/NotoSansKhmer-Regular.ttf'
	#1786 Khmer Letter Cha
	#17D2 Khmer Sign Coeng
	#1793 Khmer Letter No
	#17B6 Khmer Vowel Sign Aa
	#17C6 Khmer Sign Nikahit
	text = '\u1786\u17D2\u1793\u17B6\u17C6'

blob = hb.Blob.from_file_path(fontfile)
face = hb.Face(blob)
font = hb.Font(face)

buf = hb.Buffer()
buf.add_str(text)
infos = buf.glyph_infos
print(f'initial {len(infos)=}')
for i,info in enumerate(infos):
	info.cluster=i
buf.guess_segment_properties()
infos = buf.glyph_infos
print(f'guessed {len(infos)=} {[info.cluster for info in infos]}')

features = {"kern": True, "liga": True}
hb.shape(font, buf, features)

infos = buf.glyph_infos
positions = buf.glyph_positions

for info, pos in zip(infos, positions):
	gid = info.codepoint
	glyph_name = font.glyph_to_string(gid)
	cluster = info.cluster
	x_advance = pos.x_advance
	x_offset = pos.x_offset
	y_offset = pos.y_offset
	print(f"{glyph_name} gid{gid}={cluster}@{x_advance},{y_offset}+{x_advance}")

produces this output

$ tmp/tuharfbuzz 
initial len(infos)=5
guessed len(infos)=5 [0, 1, 2, 3, 4]
uni178617B6 gid248=0@923,0+923
uni17D21793 gid209=0@0,-26+0
uni17C6 gid137=0@0,-29+0

and all the returned clusters seem to be zero.

I find that if I use buf.cluster_level = 1 after creation then I do see a difference of clusters ie gid137 gets a cluster value 4

initial len(infos)=5
guessed len(infos)=5 [0, 1, 2, 3, 4]
uni178617B6 gid248=0@923,0+923
uni17D21793 gid209=0@0,-26+0
uni17C6 gid137=4@0,-29+0
@justvanrossum
Copy link
Collaborator

I don't think you are ever supposed to set the cluster manually. HarfBuzz does that for you, but there are three "levels" of operation, giving different results:

  • hb.BufferClusterLevel.DEFAULT aka hb.BufferClusterLevel.MONOTONE_GRAPHEMES
  • hb.BufferClusterLevel.MONOTONE_CHARACTERS
  • hb.BufferClusterLevel.CHARACTERS

https://harfbuzz.github.io/working-with-harfbuzz-clusters.html

In the context of your example, you would set the level like this:

buf.cluster_level = hb.BufferClusterLevel.CHARACTERS

@replabrobin
Copy link
Author

Thanks for that info. I don't need cluster.setter then. I really don't want to get into the horrid details of harfbuzz. The layout problems that result from using a shaper are enough. I suppose reportlab will need a new kind of font to allow input shaping and after line breaking the line drawing will need additional positioning. I doubt that we will end up with just one way to do it :(

@behdad
Copy link
Member

behdad commented Jun 11, 2024

Setting clusters on the buffer is sometimes useful. For example, in hb-view we reset them to be Unicode character index, instead of UTF-8 index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants