-
Notifications
You must be signed in to change notification settings - Fork 23
The 'PUAA' table
The Private Use Area attribute table (tag name: 'PUAA'
) provides a method for specifying Unicode character properties for characters in the Unicode Private Use Area (code points E000-F8FF
, F0000-FFFFD
, and 100000-10FFFD
).
This table is not part of the Unicode Standard and is not endorsed by the Unicode Consortium. The information included in this table should be considered informative only and is not expected to be used by any text rendering engine.
The table header gives the table version number, the number of properties defined, and the offset in bytes to the property name and subtable. The format of the table header is as follows:
Type | Name | Description |
---|---|---|
UInt16 |
version |
The table version number. Set to 1. |
UInt16 |
propertyCount |
The number of properties defined. |
PropertyRecord |
propertyRecord[propertyCount] |
The property records array. |
The property records array follows the table header. Each record consists of an offset to the property name and an offset to the subtable header. Here is the format of a PropertyRecord
:
Type | Name | Description |
---|---|---|
UInt32 |
propertyNameOffset |
Offset in bytes from the start of the table to the property name. |
UInt32 |
subtableHeaderOffset |
Offset in bytes from the start of the table to the subtable header. |
Property records should be sorted in ascending order by property name. The property name consists of a length byte followed by the name in UTF-8.
Type | Name | Description |
---|---|---|
UInt8 |
length |
Length of string in bytes. |
UInt8 |
name[length] |
UTF-8 encoded string. |
Each property record points to a property subtable header consisting of the number of entries and an array of entry records. The format of the subtable header is as follows:
Type | Name | Description |
---|---|---|
UInt16 |
entryCount |
The number of entries in this subtable. |
EntryRecord |
entryRecord[entryCount] |
The entry records array. |
The entry records array follows the subtable header. Each record consists of a type, a range of code points, and a value or offset.
Type | Name | Description |
---|---|---|
UInt8 |
entryType |
The type of the entry data. |
UInt8 |
plane |
The Unicode plane of the range of code points covered by this entry. One of 0, 15, or 16. |
UInt16 |
firstCodePoint |
The least significant 16 bits of the first code point covered by this entry. |
UInt16 |
lastCodePoint |
The least significant 16 bits of the last code point covered by this entry. |
UInt32 |
entryData |
Interpretation varies according to entryType . |
The entryType
field determines the interpretation of the entryData
field:
Entry Type ID | Entry Type | Meaning |
---|---|---|
1 |
Single |
The entryData field contains either an offset to a UTF-8 character string or a 0-4 byte ASCII character string. |
2 |
Multiple |
The entryData field contains an offset to an array of Single values, one for each code point. |
3 |
Boolean |
The entryData field contains zero for a false property value or nonzero for a true property value. |
4 |
Decimal |
The entryData field contains a plain integer value. |
5 |
Hexadecimal |
The entryData field contains a Unicode code point value. |
6 |
HexMultiple |
The entryData field contains an offset to an array of Hexadecimal values, one for each code point. |
7 |
HexSequence |
The entryData field contains an offset to an array of Unicode code point values. |
8 |
CaseMapping |
The entryData field contains an offset to an array of Unicode code point values, plus a Single value. |
9 |
NameAlias |
The entryData field contains an offset to an array of two Single values, one for a name and one for a name type. |
If the most significant bit of the entryData
field is clear, the entryData
field contains an offset in bytes from the start of the table to a property value string. The property value string consists of a length byte followed by the value in UTF-8.
If the most significant bit of the entryData
field is set, the entryData
field itself contains up to four ASCII characters, padded with null bytes if less than four. For example, CC61746E
is the string "Latn"
, CC750000
is the string "Lu"
, and 80000000
is the empty string ""
.
The value of the entry applies to the entire range of code points. If multiple entries of type 1 or 2 apply to a code point, the value of the property for that code point shall be the concatenation of all values of matching entries, in the order in which they are encountered. It is possible for the resulting property value to exceed 255 bytes.
The entryData
field contains an offset in bytes from the start of the table to an array of values. The array starts with the number of values, which should equal the number of code points covered by the entry:
Type | Name | Description |
---|---|---|
UInt16 |
valueCount |
The number of values. Should equal lastCodePoint - firstCodePoint + 1 . |
UInt32 |
valueData[valueCount] |
The array of values, each interpreted as in entry type 1. |
The first value applies to firstCodePoint
, the second to firstCodePoint + 1
, and so on. Each valueData
field contains either an offset to a string, or up to four ASCII characters with the most significant bit set, as in entry type 1. If multiple entries of type 1 or 2 apply to a code point, the value of the property for that code point shall be the concatenation of all values of matching entries, in the order in which they are encountered. It is possible for the resulting property value to exceed 255 bytes.
The entryData
field contains a boolean value. If zero, the property value is false. If nonzero, the property value is true. The value of the entry applies to the entire range of code points.
This entry type is used for boolean properties such as Bidi_Mirrored
from UnicodeData.txt
, White_Space
from PropList.txt
, or Emoji_Component
from emoji-data.txt
.
The entryData
field contains a decimal integer value. The value of the entry applies to the entire range of code points.
This entry type is used for Canonical_Combining_Class
from UnicodeData.txt
as well as some Unihan properties such as kTotalStrokes
.
The entryData
field contains a single code point value. The value of the entry applies to the entire range of code points.
This entry type is used for properties such as Bidi_Mirroring_Glyph
and Simple_Uppercase_Mapping
.
The entryData
field contains an offset in bytes from the start of the table to an array of code point values. The array starts with the number of code point values, which should equal the number of code points covered by the entry:
Type | Name | Description |
---|---|---|
UInt16 |
valueCount |
The number of values. Should equal lastCodePoint - firstCodePoint + 1 . |
UInt32 |
valueData[valueCount] |
The array of values, each interpreted as in entry type 5. |
The first code point value applies to firstCodePoint
, the second to firstCodePoint + 1
, and so on. Each valueData
field contains a single code point value, as in entry type 5.
This entry type is used for properties such as Bidi_Mirroring_Glyph
and Simple_Uppercase_Mapping
.
The entryData
field contains an offset in bytes from the start of the table to an array. The array contains any number of code point values, interpreted as in entry type 5.
Type | Name | Description |
---|---|---|
UInt16 |
valueCount |
The number of code point values. |
UInt32 |
valueData[valueCount] |
The array of code point values. |
The array itself is the value of the entry, which applies to the entire range of code points.
This entry type is used for the Decomposition_Mapping
property from UnicodeData.txt
.
The entryData
field contains an offset in bytes from the start of the table to an array. The array contains any number of code point values, interpreted as in entry type 5, followed by one string value, interpreted as in entry type 1.
Type | Name | Description |
---|---|---|
UInt16 |
valueCount |
The number of values: the number of code points, plus 1. |
UInt32 |
mappingValueData[valueCount-1] |
The array of code point values. |
UInt32 |
conditionValueData |
The condition value, interpreted as in entry type 1. |
The array itself is the value of the entry, which applies to the entire range of code points.
This entry type is used for the special casing properties from SpecialCasing.txt
.
The entryData
field contains an offset in bytes from the start of the table to an array. The array contains two string values, interpreted as in entry type 1.
Type | Name | Description |
---|---|---|
UInt16 |
valueCount |
The number of values. Must equal 2. |
UInt32 |
aliasValueData |
The alias value, interpreted as in entry type 1. |
UInt32 |
typeValueData |
The type value, interpreted as in entry type 1. |
The array itself is the value of the entry, which applies to the entire range of code points.
This entry type is used for the Name_Alias
property from NameAliases.txt
.