Back Home
####Note: There has been a recent change to the zcm hashing system. Check out the announcement here
This page describes the ZCM Type System grammar, encoding, and type hashes in very formal terms. Unless you're intimately concerned with the subtlties, feel free to skim this document, and refer back as reference.
int8_t | 8-bit signed integer |
int16_t | 16-bit signed integer |
int32_t | 32-bit signed integer |
int64_t | 64-bit signed integer |
float | 32-bit IEEE floating point value |
double | 64-bit IEEE floating point value |
string | UTF-8 string |
boolean | true/false logical value |
byte | 8-bit value |
The grammar is given in EBNF using regex-style repetition and character classes:
file = zcmtype*
zcmtype = 'struct' name '{' field* '}'
field = const_field | data_field
const_field = 'const' const_type name '=' const_literal ';'
const_type = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t' | 'float' | 'double'
const_literal = hex_literal | int_literal | float_literal
data_field = type name arraydim* ';'
type = primative | name
primative = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t' | 'float' | 'double' | 'string' | 'boolean' | 'byte'
int_type = 'int8_t' | 'int16_t' | 'int32_t' | 'int64_t'
arraydim = '[' arraysize ']'
arraysize = name | uint_literal
name = underalpha underalphanum*
underalpha = [A-Za-z_]
underalphanum = [A-Za-z0-9_]
hex_literal = "0x" | hexdigit+
hexdigit = [0-9A-Fa-f]
uint_literal = [0-9]+
int_literal = '-' uint_literal
Using the grammar above, to be well-formed the following constraints must be satisfied:
- Each field's name must be unique
- Names used for array sizes must refer to a field in the same 'zcmtype' that has a scalar integer type
- Names used for 'type' refer to other 'zcmtype' definitions
- These may exist in other files
Type | Encoded Size | Format |
---|---|---|
int8_t | 1 byte | X |
int8_t | 1 byte | X |
int16_t | 2 bytes | XX |
int32_t | 4 bytes | XXXX |
int64_t | 8 bytes | XXXXXXXX |
float | 4 bytes | XXXX |
double | 8 bytes | XXXXXXXX |
string | 4+len+1 bytes | LLLL<chars>N |
boolean | 1 byte | X |
byte | 1 byte | X |
Where:
- X is a data byte
- L is a length byte
- N is a null byte
Array types are encoded as a simple series of the element type. The encoding does NOT include a length field for the dimensions. For static array dimensions, the size is already known by the decoder. For dynamic array dimensions, the size is encoded in another field (as mandated by the grammar). For these reasons, there is zero encoding overhead for arrays. This includes nested types.
Nested types are also encoded with zero overhead. Since the decoder knows the layout, there is no reason to encode type metadata. Circular type dependencies are not currently supported.
The optimized encoding formats specified above are made possible using a type hash. Each encoded message starts with a 64-bit hash field. As seen above, for one message, this is the only size overhead in ZCM Type encodings. Without the hash, the encoded data is at maximum the same size as an equivalent C struct. Further, the hash is a unique type identifier. The hash allows a decoder function to verify that a binary blob of data is encoded as expected.
To acheive this lofty goal, it is crucial to get the type hash computation right. We must ensure that that a hash uniquely identifies a type layout. The hash is not intended to be cryptographic, but instead to catch programming and configuration errors.
Hashing primatives:
i64 hashbyte(i64 hash, byte v)
{
return ((((u64)hash)<<8) ^ (((u64)hash)>>53)) + v;
}
i64 hashstring(i64 hash, string s)
{
hashbyte(s.length);
for (b in s)
hashbyte(b);
}
Hashing zcmtypes:
i64 hashtype()
{
i64 hash = 0x12345678;
if (HASH_TYPENAME)
hash = hashstring(hash, zcmtype_name);
for (fld in fields) {
if (HASH_MEMBER_NAMES)
hash = hashstring(hash, fld.name);
// Hash the type (only if its a primative)
if (isPrimativeType(fld.typename))
hash = hashstring(hash, fld.typename);
// Hash the array dimmensionality
hash = hashbyte(hash, fld.numdims)
for (dim in fld.dimlist) {
hash = hashbyte(hash, dim.mode); // static (0) or dynamic (1)
hash = hashstring(hash, dim.size); // the text btwn [] from the .zcm file
}
}
}
The hashing function above works well, but an observent reader will quickly notice that it completely ignores nested zcmtypes. This is done because zcmtypes may be defined in different files and thus, the type generator may not have access to their definitions. To resolve this, ZCM defers the final hash computation until runtime, when it can use all dependent types.
The final hash computation will be triggered on a type's first runtime use and will recurse into nested types as needed. The hash code computed above in hashtype() is typically called the base hash because it's used as the starting point in the recursive-nested hash computation. The recursive computation is fairly simple. The algorithm proceeds as follows:
i64 TYPE_hash_recursive()
{
u64 hash = BASE_HASH;
+ SUBTYPE1_hash_recursive()
+ SUBTYPE2_hash_recursive()
+ SUBTYPE3_hash_recursive()
...;
return ROTL(hash, 1); // rotate left by 1
}
Zcmgen allows the user to specify the package of the zcmtype which will then be used on a
language-by-language bases to group types into namespaces, modules, etc. The semantics for
specifying the package are as shown in the example below, which constructs a type bar
within the package foo
. Note that the specified package can actually be multiple nested
packages, ie replacing foo
with foo1.foo2
would instead place the type bar
within
the package foo2
which itself is within the package foo1
.
package foo;
struct bar {
baz b;
.qux q;
};
When a type belongs to a package, all nonprimitive types within that type are assumed to
also be from that package. In the example above, the zcmtype foo.bar
contains a member
b
of type foo.baz
(ie the package foo
is automatically prepended to the specified
type baz
because the zcmtype bar
is from the package foo
). Should the user wish
to specify a type that does not belong to the same package as the containing type, they
can prepend the type with a .
as in the case of the member q
from the example, which
will not belong to any package. This also allows the user to specify a member type from a
completely separate package by prepending a leading .
before the package. For instance,
if the zcmtype qux
actually belonged to a package quuz
(that is not part of foo
),
replacing .qux
with .quuz.qux
would properly specify the desired type.
Note also that although some languages allow unqualified access to types from
parent packages, the zcmtype specification does not. Specifically, for the following
2 types, note that t2
must specify its t1
member as existing within the package
.foo
even though t2
itself exists within a child package of foo
.
package foo;
struct t1 {
int8_t a;
};
package foo.bar;
struct t2 {
.foo.t1 b;
};
Back [Home](../README.md)