This document lists the per-character shaping information needed to shape N'Ko text.
Table of Contents
N'Ko glyphs should be classified as in the following table. Codepoints in the NKo block with no assigned meaning are designated as unassigned in the Unicode category column.
The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.
The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.
Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.
Note: No codepoints in the NKo block are assigned a non-null Joining group.
The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+07C0 |
Number | NON_JOINING | null | 0 | ߀ Digit Zero |
U+07C1 |
Number | NON_JOINING | null | 0 | ߁ Digit One |
U+07C2 |
Number | NON_JOINING | null | 0 | ߂ Digit Two |
U+07C3 |
Number | NON_JOINING | null | 0 | ߃ Digit Three |
U+07C4 |
Number | NON_JOINING | null | 0 | ߄ Digit Four |
U+07C5 |
Number | NON_JOINING | null | 0 | ߅ Digit Five |
U+07C6 |
Number | NON_JOINING | null | 0 | ߆ Digit Six |
U+07C7 |
Number | NON_JOINING | null | 0 | ߇ Digit Seven |
U+07C8 |
Number | NON_JOINING | null | 0 | ߈ Digit Eight |
U+07C9 |
Number | NON_JOINING | null | 0 | ߉ Digit Nine |
U+07CA |
Letter | DUAL | null | 0 | ߊ A |
U+07CB |
Letter | DUAL | null | 0 | ߋ Ee |
U+07CC |
Letter | DUAL | null | 0 | ߌ I |
U+07CD |
Letter | DUAL | null | 0 | ߍ E |
U+07CE |
Letter | DUAL | null | 0 | ߎ U |
U+07CF |
Letter | DUAL | null | 0 | ߏ Oo |
U+07D0 |
Letter | DUAL | null | 0 | ߐ O |
U+07D1 |
Letter | DUAL | null | 0 | ߑ Dagbasinna |
U+07D2 |
Letter | DUAL | null | 0 | ߒ N |
U+07D3 |
Letter | DUAL | null | 0 | ߓ Ba |
U+07D4 |
Letter | DUAL | null | 0 | ߔ Pa |
U+07D5 |
Letter | DUAL | null | 0 | ߕ Ta |
U+07D6 |
Letter | DUAL | null | 0 | ߖ Ja |
U+07D7 |
Letter | DUAL | null | 0 | ߗ Cha |
U+07D8 |
Letter | DUAL | null | 0 | ߘ Da |
U+07D9 |
Letter | DUAL | null | 0 | ߙ Ra |
U+07DA |
Letter | DUAL | null | 0 | ߚ Rra |
U+07DB |
Letter | DUAL | null | 0 | ߛ Sa |
U+07DC |
Letter | DUAL | null | 0 | ߜ Gba |
U+07DD |
Letter | DUAL | null | 0 | ߝ Fa |
U+07DE |
Letter | DUAL | null | 0 | ߞ Ka |
U+07DF |
Letter | DUAL | null | 0 | ߟ La |
U+07E0 |
Letter | DUAL | null | 0 | ߠ Na Woloso |
U+07E1 |
Letter | DUAL | null | 0 | ߡ Ma |
U+07E2 |
Letter | DUAL | null | 0 | ߢ Nya |
U+07E3 |
Letter | DUAL | null | 0 | ߣ Na |
U+07E4 |
Letter | DUAL | null | 0 | ߤ Ha |
U+07E5 |
Letter | DUAL | null | 0 | ߥ Wa |
U+07E6 |
Letter | DUAL | null | 0 | ߦ Ya |
U+07E7 |
Letter | DUAL | null | 0 | ߧ Nya Woloso |
U+07E8 |
Letter | DUAL | null | 0 | ߨ Jona Ja |
U+07E9 |
Letter | DUAL | null | 0 | ߩ Jona Cha |
U+07EA |
Letter | DUAL | null | 0 | ߪ Jona Ra |
U+07EB |
Mark [Mn] | TRANSPARENT | null | 230 | ߫ Combining Short High Tone |
U+07EC |
Mark [Mn] | TRANSPARENT | null | 230 | ߬ Combining Short Low Tone |
U+07ED |
Mark [Mn] | TRANSPARENT | null | 230 | ߭ Combining Short Rising Tone |
U+07EE |
Mark [Mn] | TRANSPARENT | null | 230 | ߮ Combining Long Descending Tone |
U+07EF |
Mark [Mn] | TRANSPARENT | null | 230 | ߯ Combining Long High Tone |
U+07F0 |
Mark [Mn] | TRANSPARENT | null | 230 | ߰ Combining Long Low Tone |
U+07F1 |
Mark [Mn] | TRANSPARENT | null | 230 | ߱ Combining Long Rising Tone |
U+07F2 |
Mark [Mn] | TRANSPARENT | null | 220 | ߲ Combining Nasalization Mark |
U+07F3 |
Mark [Mn] | TRANSPARENT | null | 230 | ߳ Combining Double Dot Above |
U+07F4 |
Letter modifier | NON_JOINING | null | 0 | ߴ High Tone Apostrophe |
U+07F5 |
Letter modifier | NON_JOINING | null | 0 | ߵ Low Tone Apostrophe |
U+07F6 |
Symbol | NON_JOINING | null | 0 | ߶ Symbol Oo Dennen |
U+07F7 |
Symbol | NON_JOINING | null | 0 | ߷ Symbol Gbakurunen |
U+07F8 |
Punctuation | NON_JOINING | null | 0 | ߸ Comma |
U+07F9 |
Punctuation | NON_JOINING | null | 0 | ߹ Exclamation Mark |
U+07FA |
Letter modifier | JOIN_CAUSING | null | 0 | ߺ Lajanyalan |
U+07FB |
unassigned | ||||
U+07FC |
unassigned | ||||
U+07FD |
Mark [Mn] | TRANSPARENT | null | 220 | ߽ Dantalayan |
U+07FE |
Symbol | NON_JOINING | null | 0 | ߾ Dorome Sign |
U+07FF |
Symbol | NON_JOINING | null | 0 | ߿ Taman Sign |
Other important characters that may be encountered when shaping runs
of Arabic text include the dotted-circle placeholder (U+25CC
), the
combining grapheme joiner (U+034F
), the zero-width joiner (U+200D
)
and zero-width non-joiner (U+200C
), the left-to-right text marker
(U+200E
) and right-to-left text marker (U+200F
), and the no-break
space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph |
---|---|---|---|---|---|
U+00A0 |
Separator | NON_JOINING | null | 0 | No-break space |
U+034F |
Other | NON_JOINING | null | 0 | ͏ Combining grapheme joiner |
U+200C |
Other | NON_JOINING | null | 0 | Zero-width non-joiner |
U+200D |
Other | JOIN_CAUSING | null | 0 | Zero-width joiner |
U+200E |
Other | NON_JOINING | null | 0 | Left-to-Right marker |
U+200F |
Other | NON_JOINING | null | 0 | Right-to-Left marker |
U+2010 |
Punctuation | NON_JOINING | null | 0 | ‐ Hyphen |
U+2011 |
Punctuation | NON_JOINING | null | 0 | ‑ No-break hyphen |
U+2012 |
Punctuation | NON_JOINING | null | 0 | ‒ Figure dash |
U+2013 |
Punctuation | NON_JOINING | null | 0 | – En dash |
U+2014 |
Punctuation | NON_JOINING | null | 0 | — Em dash |
U+25CC |
Symbol | NON_JOINING | null | 0 | ◌ Dotted circle |
The combining grapheme joiner (CGJ) is primarily used to alter the order in which adjacent marks are positioned during the mark-reordering stage, in order to adhere to the needs of a non-default language orthography.
The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.
For example, to show the initial form of a letter in isolation (such as for dislaying it in a table of forms), the sequence "Letter,ZWJ" would be used. To show the medial form of a letter in isolation, the sequence "ZWJ,Letter,ZWJ" would be used.
The right-to-left mark (RLM) and left-to-right mark (LRM) are used by the Unicode bidirectionality algorithm (BiDi) to indicate the points in a text run at which the writing direction changes.
The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.