Skip to content

Latest commit

 

History

History
176 lines (143 loc) · 13.9 KB

character-tables-nko.md

File metadata and controls

176 lines (143 loc) · 13.9 KB

N'Ko character tables

This document lists the per-character shaping information needed to shape N'Ko text.

Table of Contents

NKo character table

N'Ko glyphs should be classified as in the following table. Codepoints in the NKo block with no assigned meaning are designated as unassigned in the Unicode category column.

The Joining type column indicates whether each codepoint is defined as joining with adjacent characters on the left side, right side, left and right sides ("DUAL"), or neither side ("NON_JOINING"). Codepoints designated TRANSPARENT in the Joining type column do not join with adjacent characters and, in addition, do not affect the joining behavior of surrounding characters. Non-spacing marks are of type TRANSPARENT. Codepoints designated JOIN_CAUSING force adjacent characters to join.

The Joining group column lists the fundamental letter that the listed codepoint behaves like for joining purposes.

Assigned codepoints with a null in the Joining group column evoke no special behavior from the shaping engine during the join-computation stage.

Note: No codepoints in the NKo block are assigned a non-null Joining group.

The Mark class column indicates the Canonical Combining Class for the codepoint. Marks are assigned non-zero combining classes so that sequences of adjacent marks can be reordered as required by the orthography.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+07C0 Number NON_JOINING null 0 ߀ Digit Zero
U+07C1 Number NON_JOINING null 0 ߁ Digit One
U+07C2 Number NON_JOINING null 0 ߂ Digit Two
U+07C3 Number NON_JOINING null 0 ߃ Digit Three
U+07C4 Number NON_JOINING null 0 ߄ Digit Four
U+07C5 Number NON_JOINING null 0 ߅ Digit Five
U+07C6 Number NON_JOINING null 0 ߆ Digit Six
U+07C7 Number NON_JOINING null 0 ߇ Digit Seven
U+07C8 Number NON_JOINING null 0 ߈ Digit Eight
U+07C9 Number NON_JOINING null 0 ߉ Digit Nine
U+07CA Letter DUAL null 0 ߊ A
U+07CB Letter DUAL null 0 ߋ Ee
U+07CC Letter DUAL null 0 ߌ I
U+07CD Letter DUAL null 0 ߍ E
U+07CE Letter DUAL null 0 ߎ U
U+07CF Letter DUAL null 0 ߏ Oo
U+07D0 Letter DUAL null 0 ߐ O
U+07D1 Letter DUAL null 0 ߑ Dagbasinna
U+07D2 Letter DUAL null 0 ߒ N
U+07D3 Letter DUAL null 0 ߓ Ba
U+07D4 Letter DUAL null 0 ߔ Pa
U+07D5 Letter DUAL null 0 ߕ Ta
U+07D6 Letter DUAL null 0 ߖ Ja
U+07D7 Letter DUAL null 0 ߗ Cha
U+07D8 Letter DUAL null 0 ߘ Da
U+07D9 Letter DUAL null 0 ߙ Ra
U+07DA Letter DUAL null 0 ߚ Rra
U+07DB Letter DUAL null 0 ߛ Sa
U+07DC Letter DUAL null 0 ߜ Gba
U+07DD Letter DUAL null 0 ߝ Fa
U+07DE Letter DUAL null 0 ߞ Ka
U+07DF Letter DUAL null 0 ߟ La
U+07E0 Letter DUAL null 0 ߠ Na Woloso
U+07E1 Letter DUAL null 0 ߡ Ma
U+07E2 Letter DUAL null 0 ߢ Nya
U+07E3 Letter DUAL null 0 ߣ Na
U+07E4 Letter DUAL null 0 ߤ Ha
U+07E5 Letter DUAL null 0 ߥ Wa
U+07E6 Letter DUAL null 0 ߦ Ya
U+07E7 Letter DUAL null 0 ߧ Nya Woloso
U+07E8 Letter DUAL null 0 ߨ Jona Ja
U+07E9 Letter DUAL null 0 ߩ Jona Cha
U+07EA Letter DUAL null 0 ߪ Jona Ra
U+07EB Mark [Mn] TRANSPARENT null 230 ߫ Combining Short High Tone
U+07EC Mark [Mn] TRANSPARENT null 230 ߬ Combining Short Low Tone
U+07ED Mark [Mn] TRANSPARENT null 230 ߭ Combining Short Rising Tone
U+07EE Mark [Mn] TRANSPARENT null 230 ߮ Combining Long Descending Tone
U+07EF Mark [Mn] TRANSPARENT null 230 ߯ Combining Long High Tone
U+07F0 Mark [Mn] TRANSPARENT null 230 ߰ Combining Long Low Tone
U+07F1 Mark [Mn] TRANSPARENT null 230 ߱ Combining Long Rising Tone
U+07F2 Mark [Mn] TRANSPARENT null 220 ߲ Combining Nasalization Mark
U+07F3 Mark [Mn] TRANSPARENT null 230 ߳ Combining Double Dot Above
U+07F4 Letter modifier NON_JOINING null 0 ߴ High Tone Apostrophe
U+07F5 Letter modifier NON_JOINING null 0 ߵ Low Tone Apostrophe
U+07F6 Symbol NON_JOINING null 0 ߶ Symbol Oo Dennen
U+07F7 Symbol NON_JOINING null 0 ߷ Symbol Gbakurunen
U+07F8 Punctuation NON_JOINING null 0 ߸ Comma
U+07F9 Punctuation NON_JOINING null 0 ߹ Exclamation Mark
U+07FA Letter modifier JOIN_CAUSING null 0 ߺ Lajanyalan
U+07FB unassigned
U+07FC unassigned
U+07FD Mark [Mn] TRANSPARENT null 220 ߽ Dantalayan
U+07FE Symbol NON_JOINING null 0 ߾ Dorome Sign
U+07FF Symbol NON_JOINING null 0 ߿ Taman Sign

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Arabic text include the dotted-circle placeholder (U+25CC), the combining grapheme joiner (U+034F), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), the left-to-right text marker (U+200E) and right-to-left text marker (U+200F), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Joining type Joining group Mark class Glyph
U+00A0 Separator NON_JOINING null 0   No-break space
U+034F Other NON_JOINING null 0 ͏ Combining grapheme joiner
U+200C Other NON_JOINING null 0 ‌ Zero-width non-joiner
U+200D Other JOIN_CAUSING null 0 ‍ Zero-width joiner
U+200E Other NON_JOINING null 0 ‎ Left-to-Right marker
U+200F Other NON_JOINING null 0 ‏ Right-to-Left marker
U+2010 Punctuation NON_JOINING null 0 ‐ Hyphen
U+2011 Punctuation NON_JOINING null 0 ‑ No-break hyphen
U+2012 Punctuation NON_JOINING null 0 ‒ Figure dash
U+2013 Punctuation NON_JOINING null 0 – En dash
U+2014 Punctuation NON_JOINING null 0 — Em dash
U+25CC Symbol NON_JOINING null 0 ◌ Dotted circle

The combining grapheme joiner (CGJ) is primarily used to alter the order in which adjacent marks are positioned during the mark-reordering stage, in order to adhere to the needs of a non-default language orthography.

The zero-width joiner (ZWJ) is primarily used to force the usage of the cursive connecting form of a letter even when the context of the adjoining letters would not trigger the connecting form.

For example, to show the initial form of a letter in isolation (such as for dislaying it in a table of forms), the sequence "Letter,ZWJ" would be used. To show the medial form of a letter in isolation, the sequence "ZWJ,Letter,ZWJ" would be used.

The right-to-left mark (RLM) and left-to-right mark (LRM) are used by the Unicode bidirectionality algorithm (BiDi) to indicate the points in a text run at which the writing direction changes.

The no-break space is primarily used to display those codepoints that are defined as non-spacing (such as vowel or diacritical marks and "Hamza") in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder.