Skip to content

Latest commit

 

History

History
212 lines (168 loc) · 8.88 KB

zcmtypesys.md

File metadata and controls

212 lines (168 loc) · 8.88 KB

Back Home

ZCM Type System

####Note: There has been a recent change to the zcm hashing system. Check out the announcement here

This page describes the ZCM Type System grammar, encoding, and type hashes in very formal terms. Unless you're intimately concerned with the subtlties, feel free to skim this document, and refer back as reference.

Grammar

Primitives

int8_t 8-bit signed integer
int16_t 16-bit signed integer
int32_t 32-bit signed integer
int64_t 64-bit signed integer
float 32-bit IEEE floating point value
double 64-bit IEEE floating point value
string UTF-8 string
boolean true/false logical value
byte 8-bit value

Specification

The grammar is given in EBNF using regex-style repetition and character classes:

file          = zcmtype*
zcmtype       = 'struct' name '{' field* '}'
field         = const_field | data_field
const_field   = 'const' const_type name '=' const_literal ';'
const_type    = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t' | 'float' | 'double'
const_literal = hex_literal | int_literal | float_literal
data_field    = type name arraydim* ';'
type          = primative | name
primative     = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t' | 'float' | 'double' | 'string' | 'boolean' | 'byte'
int_type      = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t'
arraydim      = '[' arraysize ']'
arraysize     = name | uint_literal
name          = underalpha underalphanum*
underalpha    = [A-Za-z_]
underalphanum = [A-Za-z0-9_]
hex_literal   = "0x" | hexdigit+
hexdigit      = [0-9A-Fa-f]
uint_literal  = [0-9]+
int_literal   = '-' uint_literal

Semantic Constraints

Using the grammar above, to be well-formed the following constraints must be satisfied:

  • Each field's name must be unique
  • Names used for array sizes must refer to a field in the same 'zcmtype' that has a scalar integer type
  • Names used for 'type' refer to other 'zcmtype' definitions
    • These may exist in other files

Encoding formats

Primitives

Type Encoded Size Format
int8_t 1 byte X
int8_t 1 byte X
int16_t 2 bytes XX
int32_t 4 bytes XXXX
int64_t 8 bytes XXXXXXXX
float 4 bytes XXXX
double 8 bytes XXXXXXXX
string 4+len+1 bytes LLLL<chars>N
boolean 1 byte X
byte 1 byte X

Where:

  • X is a data byte
  • L is a length byte
  • N is a null byte

Array Types

Array types are encoded as a simple series of the element type. The encoding does NOT include a length field for the dimensions. For static array dimensions, the size is already known by the decoder. For dynamic array dimensions, the size is encoded in another field (as mandated by the grammar). For these reasons, there is zero encoding overhead for arrays. This includes nested types.

Recursive/Nested Types

Nested types are also encoded with zero overhead. Since the decoder knows the layout, there is no reason to encode type metadata. Circular type dependencies are not currently supported.

Type Hashes

The optimized encoding formats specified above are made possible using a type hash. Each encoded message starts with a 64-bit hash field. As seen above, for one message, this is the only size overhead in ZCM Type encodings. Without the hash, the encoded data is at maximum the same size as an equivalent C struct. Further, the hash is a unique type identifier. The hash allows a decoder function to verify that a binary blob of data is encoded as expected.

To acheive this lofty goal, it is crucial to get the type hash computation right. We must ensure that that a hash uniquely identifies a type layout. The hash is not intended to be cryptographic, but instead to catch programming and configuration errors.

Hashing primatives:

i64 hashbyte(i64 hash, byte v)
{
    return ((((u64)hash)<<8) ^ (((u64)hash)>>53)) + v;
}

i64 hashstring(i64 hash, string s)
{
    hashbyte(s.length);
    for (b in s)
        hashbyte(b);
}

Hashing zcmtypes:

i64 hashtype()
{
    i64 hash = 0x12345678;

    if (HASH_TYPENAME)
        hash = hashstring(hash, zcmtype_name);

    for (fld in fields) {
        if (HASH_MEMBER_NAMES)
            hash = hashstring(hash, fld.name);

        // Hash the type (only if its a primative)
        if (isPrimativeType(fld.typename))
            hash = hashstring(hash, fld.typename);

        // Hash the array dimmensionality
        hash = hashbyte(hash, fld.numdims)
        for (dim in fld.dimlist) {
            hash = hashbyte(hash, dim.mode);   // static (0) or dynamic (1)
            hash = hashstring(hash, dim.size); // the text btwn [] from the .zcm file
        }
    }
}

The hashing function above works well, but an observent reader will quickly notice that it completely ignores nested zcmtypes. This is done because zcmtypes may be defined in different files and thus, the type generator may not have access to their definitions. To resolve this, ZCM defers the final hash computation until runtime, when it can use all dependent types.

The final hash computation will be triggered on a type's first runtime use and will recurse into nested types as needed. The hash code computed above in hashtype() is typically called the base hash because it's used as the starting point in the recursive-nested hash computation. The recursive computation is fairly simple. The algorithm proceeds as follows:

i64 TYPE_hash_recursive()
{
    u64 hash = BASE_HASH;
             + SUBTYPE1_hash_recursive()
             + SUBTYPE2_hash_recursive()
             + SUBTYPE3_hash_recursive()
             ...;

    return ROTL(hash, 1); // rotate left by 1
}

Packages

Zcmgen allows the user to specify the package of the zcmtype which will then be used on a language-by-language bases to group types into namespaces, modules, etc. The semantics for specifying the package are as shown in the example below, which constructs a type bar within the package foo. Note that the specified package can actually be multiple nested packages, ie replacing foo with foo1.foo2 would instead place the type bar within the package foo2 which itself is within the package foo1.

package foo;
struct bar {
    baz  b;
    .qux q;
};

When a type belongs to a package, all nonprimitive types within that type are assumed to also be from that package. In the example above, the zcmtype foo.bar contains a member b of type foo.baz (ie the package foo is automatically prepended to the specified type baz because the zcmtype bar is from the package foo). Should the user wish to specify a type that does not belong to the same package as the containing type, they can prepend the type with a . as in the case of the member q from the example, which will not belong to any package. This also allows the user to specify a member type from a completely separate package by prepending a leading . before the package. For instance, if the zcmtype qux actually belonged to a package quuz (that is not part of foo), replacing .qux with .quuz.qux would properly specify the desired type.

Note also that although some languages allow unqualified access to types from parent packages, the zcmtype specification does not. Specifically, for the following 2 types, note that t2 must specify its t1 member as existing within the package .foo even though t2 itself exists within a child package of foo.

package foo;
struct t1 {
    int8_t a;
};

package foo.bar;
struct t2 {
    .foo.t1 b;
};

Back [Home](../README.md)