-
Notifications
You must be signed in to change notification settings - Fork 5
Technical Docs
The serialization format used by Cereal is really simple: All the data types are stored in Big Endian as raw data with a byte in front that tells us what data type it is.
std::string
are stored a bit different: first, a short
(2 bytes) is stored in front of it that tells us how many characters does the string have. And then, all the characters are stored as single-byte ASCII characters, one after the other. So the string Database
would be encoded as 0x00 0x08 0x44 0x61 0x74 0x61 0x62 0x61 0x73 0x65
(8 characters, D
, a
, t
, a
, b
, a
, s
, e
).
In Cereal, headers contain databases, databases contain objects, and objects contain fields and arrays. This is important in order to organize our databases.
In the same way, headers delete databases when deleted, databases delete objects, and finally objects delete fields and arrays, so if we want to free memory, we just need to delete header;
to delete everything if we are using headers or delete database;
if we are only using a database without any header.
Once we know that we can move into our serialization units:
Fields have a byte that identifies them as a field (value 0x09
). Next, we can find a short
indicating the length of the name, followed by the ASCII name.
Finally, we find another byte indicating the data type and the data (from 1 to 8 bytes long, depending on the data type).
Arrays also have a byte that identifies them as an array (value 0x0A
) because of compatibility with fields and objects. After that, we find a short
that indicates the length of the name, and an array of bytes with the name.
As usual, now we can find the byte indicating the data type of the array followed by four bytes indicating the item count of the array. After that we can see all the raw bytes of the array.
Use sizeof(data type) * count
to figure out the amount of bytes an array uses.
Again, objects start with a byte that identifies them as an object (value 0x0B
). Next, we find an string representing the name of the object, followed by a short
with the field count of the object. After the field count we can find the fields, one after the other, containing what has been described above. Finally, we find another short
(the array count) and the arrays.
Databases start with a short
(two bytes) describing the version. Currently, there are two versions for databases: 1.0 and 2.0, and they are represented as 0x0100
and 0x0200
respectively.
Next, we can see the string containing the name of the database, followed by four bytes with the size of the database. Here's why a database cannot be larger than four gigabytes, because the four bytes can only represent up to 4 Gb.
Finally, we can see the object count as a short
and all the objects as described above.
Headers start with a magic value, that is a value used to check if it is a header or not. It may be changed in future releases.
After the short
0x524D
(the magic value), we find the database count as a byte
, so a header can store up to 255 databases. Next, we can find an array of unsigned int
s with the offsets where the databases start, and finally we can see the databases just as described above.
Home | About Cereal | Setup Guide | Technical Docs | API Reference | Copyright © 2016 - 2019 The Cereal Team