The 'PUAA' table

General table information

The Private Use Area attribute table (tag name: 'PUAA') provides a method for specifying Unicode character properties for characters in the Unicode Private Use Area (code points E000-F8FF, F0000-FFFFD, and 100000-10FFFD).

This table is not part of the Unicode Standard and is not endorsed by the Unicode Consortium. The information included in this table should be considered informative only and is not expected to be used by any text rendering engine.

PUA Attribute Table Format

The table header gives the table version number, the number of properties defined, and the offset in bytes to the property name and subtable. The format of the table header is as follows:

Type	Name	Description
`UInt16`	`version`	The table version number. Set to 1.
`UInt16`	`propertyCount`	The number of properties defined.
`PropertyRecord`	`propertyRecord[propertyCount]`	The property records array.

The property records array follows the table header. Each record consists of an offset to the property name and an offset to the subtable header. Here is the format of a PropertyRecord:

Type	Name	Description
`UInt32`	`propertyNameOffset`	Offset in bytes from the start of the table to the property name.
`UInt32`	`subtableHeaderOffset`	Offset in bytes from the start of the table to the subtable header.

Property records should be sorted in ascending order by property name. The property name consists of a length byte followed by the name in UTF-8.

Type	Name	Description
`UInt8`	`length`	Length of string in bytes.
`UInt8`	`name[length]`	UTF-8 encoded string.

Property Subtable Format

Each property record points to a property subtable header consisting of the number of entries and an array of entry records. The format of the subtable header is as follows:

Type	Name	Description
`UInt16`	`entryCount`	The number of entries in this subtable.
`EntryRecord`	`entryRecord[entryCount]`	The entry records array.

The entry records array follows the subtable header. Each record consists of a type, a range of code points, and a value or offset.

Type	Name	Description
`UInt8`	`entryType`	The type of the entry data.
`UInt8`	`plane`	The Unicode plane of the range of code points covered by this entry. One of 0, 15, or 16.
`UInt16`	`firstCodePoint`	The least significant 16 bits of the first code point covered by this entry.
`UInt16`	`lastCodePoint`	The least significant 16 bits of the last code point covered by this entry.
`UInt32`	`entryData`	Interpretation varies according to `entryType`.

The entryType field determines the interpretation of the entryData field:

Entry Type ID	Entry Type	Meaning
`1`	`Single`	The `entryData` field contains either an offset to a UTF-8 character string or a 0-4 byte ASCII character string.
`2`	`Multiple`	The `entryData` field contains an offset to an array of `Single` values, one for each code point.
`3`	`Boolean`	The `entryData` field contains zero for a `false` property value or nonzero for a `true` property value.
`4`	`Decimal`	The `entryData` field contains a plain integer value.
`5`	`Hexadecimal`	The `entryData` field contains a Unicode code point value.
`6`	`HexMultiple`	The `entryData` field contains an offset to an array of `Hexadecimal` values, one for each code point.
`7`	`HexSequence`	The `entryData` field contains an offset to an array of Unicode code point values.
`8`	`CaseMapping`	The `entryData` field contains an offset to an array of Unicode code point values, plus a `Single` value.
`9`	`NameAlias`	The `entryData` field contains an offset to an array of two `Single` values, one for a name and one for a name type.

Entry Type 1 (`Single`)

If the most significant bit of the entryData field is clear, the entryData field contains an offset in bytes from the start of the table to a property value string. The property value string consists of a length byte followed by the value in UTF-8.

If the most significant bit of the entryData field is set, the entryData field itself contains up to four ASCII characters, padded with null bytes if less than four. For example, CC61746E is the string "Latn", CC750000 is the string "Lu", and 80000000 is the empty string "".

The value of the entry applies to the entire range of code points. If multiple entries of type 1 or 2 apply to a code point, the value of the property for that code point shall be the concatenation of all values of matching entries, in the order in which they are encountered. It is possible for the resulting property value to exceed 255 bytes.

Entry Type 2 (`Multiple`)

The entryData field contains an offset in bytes from the start of the table to an array of values. The array starts with the number of values, which should equal the number of code points covered by the entry:

Type	Name	Description
`UInt16`	`valueCount`	The number of values. Should equal `lastCodePoint - firstCodePoint + 1`.
`UInt32`	`valueData[valueCount]`	The array of values, each interpreted as in entry type 1.

The first value applies to firstCodePoint, the second to firstCodePoint + 1, and so on. Each valueData field contains either an offset to a string, or up to four ASCII characters with the most significant bit set, as in entry type 1. If multiple entries of type 1 or 2 apply to a code point, the value of the property for that code point shall be the concatenation of all values of matching entries, in the order in which they are encountered. It is possible for the resulting property value to exceed 255 bytes.

Entry Type 3 (`Boolean`)

The entryData field contains a boolean value. If zero, the property value is false. If nonzero, the property value is true. The value of the entry applies to the entire range of code points.

This entry type is used for boolean properties such as Bidi_Mirrored from UnicodeData.txt, White_Space from PropList.txt, or Emoji_Component from emoji-data.txt.

Entry Type 4 (`Decimal`)

The entryData field contains a decimal integer value. The value of the entry applies to the entire range of code points.

This entry type is used for Canonical_Combining_Class from UnicodeData.txt as well as some Unihan properties such as kTotalStrokes.

Entry Type 5 (`Hexadecimal`)

The entryData field contains a single code point value. The value of the entry applies to the entire range of code points.

This entry type is used for properties such as Bidi_Mirroring_Glyph and Simple_Uppercase_Mapping.

Entry Type 6 (`HexMultiple`)

The entryData field contains an offset in bytes from the start of the table to an array of code point values. The array starts with the number of code point values, which should equal the number of code points covered by the entry:

Type	Name	Description
`UInt16`	`valueCount`	The number of values. Should equal `lastCodePoint - firstCodePoint + 1`.
`UInt32`	`valueData[valueCount]`	The array of values, each interpreted as in entry type 5.

The first code point value applies to firstCodePoint, the second to firstCodePoint + 1, and so on. Each valueData field contains a single code point value, as in entry type 5.

This entry type is used for properties such as Bidi_Mirroring_Glyph and Simple_Uppercase_Mapping.

Entry Type 7 (`HexSequence`)

The entryData field contains an offset in bytes from the start of the table to an array. The array contains any number of code point values, interpreted as in entry type 5.

Type	Name	Description
`UInt16`	`valueCount`	The number of code point values.
`UInt32`	`valueData[valueCount]`	The array of code point values.

The array itself is the value of the entry, which applies to the entire range of code points.

This entry type is used for the Decomposition_Mapping property from UnicodeData.txt.

Entry Type 8 (`CaseMapping`)

The entryData field contains an offset in bytes from the start of the table to an array. The array contains any number of code point values, interpreted as in entry type 5, followed by one string value, interpreted as in entry type 1.

Type	Name	Description
`UInt16`	`valueCount`	The number of values: the number of code points, plus 1.
`UInt32`	`mappingValueData[valueCount-1]`	The array of code point values.
`UInt32`	`conditionValueData`	The condition value, interpreted as in entry type 1.

The array itself is the value of the entry, which applies to the entire range of code points.

This entry type is used for the special casing properties from SpecialCasing.txt.

Entry Type 9 (`NameAlias`)

The entryData field contains an offset in bytes from the start of the table to an array. The array contains two string values, interpreted as in entry type 1.

Type	Name	Description
`UInt16`	`valueCount`	The number of values. Must equal 2.
`UInt32`	`aliasValueData`	The alias value, interpreted as in entry type 1.
`UInt32`	`typeValueData`	The type value, interpreted as in entry type 1.

The array itself is the value of the entry, which applies to the entire range of code points.

This entry type is used for the Name_Alias property from NameAliases.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The 'PUAA' table

General table information

PUA Attribute Table Format

Property Subtable Format

Entry Type 1 (`Single`)

Entry Type 2 (`Multiple`)

Entry Type 3 (`Boolean`)

Entry Type 4 (`Decimal`)

Entry Type 5 (`Hexadecimal`)

Entry Type 6 (`HexMultiple`)

Entry Type 7 (`HexSequence`)

Entry Type 8 (`CaseMapping`)

Entry Type 9 (`NameAlias`)

Clone this wiki locally

The 'PUAA' table

General table information

PUA Attribute Table Format

Property Subtable Format

Entry Type 1 (Single)

Entry Type 2 (Multiple)

Entry Type 3 (Boolean)

Entry Type 4 (Decimal)

Entry Type 5 (Hexadecimal)

Entry Type 6 (HexMultiple)

Entry Type 7 (HexSequence)

Entry Type 8 (CaseMapping)

Entry Type 9 (NameAlias)

Clone this wiki locally

Entry Type 1 (`Single`)

Entry Type 2 (`Multiple`)

Entry Type 3 (`Boolean`)

Entry Type 4 (`Decimal`)

Entry Type 5 (`Hexadecimal`)

Entry Type 6 (`HexMultiple`)

Entry Type 7 (`HexSequence`)

Entry Type 8 (`CaseMapping`)

Entry Type 9 (`NameAlias`)