Skip to content

Serialization Reference

Felix Gündling edited this page Feb 13, 2020 · 4 revisions

Data Structures

The following data structures exist in the namespaces cista::offset and cista::raw:

  • vector<T>: serializable version of std::vector<T>
  • string: serializable version of std::string
  • unique_ptr<T>: serializable version of std::unique_ptr<T>
  • hash_map<K, V>: serializable version of std::unordered_map (using Google's Swiss Table)
  • ptr<T>: serializable pointer: cista::raw::ptr<T> is just a T*, cista::offset::ptr<T> is a specialized data structure that behaves mostly like a T* (overloaded ->, *, etc. operators).

Currently, they do not provide exactly the same interface as their std:: equivalents.

Pointers

A cista::ptr<T> can only point to null or to a value stored in the serialized buffer. Pointing to a value within the serialized buffer requires that the offset it was written at is known at serialization time.

There are three ways to index an address in order to serialize a pointer to it:

  • cista::unique_ptr<T>: Every cista::unique_ptr<T> will be indexed. Thus, pointing to values held by a cista::unique_ptr<T> is possible.
  • cista::indexed_vector<T>: Within a cista::indexed_vector<T>, every value can be referenced. This is more efficient than a cista::vector<cista::unique_ptr<T>>. However, cista::vector<T> and cista::indexed_vector<T> do not provide pointer stability after non-const operations such as resize, or emplace_back.
  • cista::indexed<T>: To be able to point to the value of member variables, it is possible to use cista::indexed<T>. cista::indexed<T> inherits from T and thus can be used just like a T.

An example using cista::indexed_vector<T> and cista::indexed<T>:

namespace data = cista::offset;

struct node;

struct edge {
  data::ptr<node> from_;
  data::ptr<node> to_;
};

struct node {
  uint32_tid_{0};
  data::vector<data::ptr<edge>> edges_;
  cista::indexed<data::string> name_;
};

struct graph {
  data::indexed_vector<node> nodes_;
  data::indexed_vector<edge> edges_;
  data::vector<data::ptr<data::string>> node_names_;
};

Serialization and Deserialization Functions

Mode

Serialization and deserialiazation have to use the same mode. This can be ensured by storing the mode in a constexpr variable. This variable can then be passed to cista::serialize() and cista::deserialize().

The cista::mode enum provides the following values:

  • NONE - default mode (default values are listed below)
  • UNCHECKED - do no bounds checks for types (only affects deserialization)
  • WITH_VERSION - store the data structure version (8 byte), default value: off
  • WITH_INTEGRITY - store a hash sum of the serialized data (8 byte), default value: off
  • SERIALIZE_BIG_ENDIAN - use big endian format when serializing (default: little endian)
  • DEEP_CHECK - apply deep checking for security (only affects deserialization)
  • CAST - casts the buffer pointer (with compile time checks that the buffer stays unmodified: no endian conversion and only offset pointer data structures)

The stored data structure version (cista::mode::WITH_VERSION) and hash sum (cista::mode::WITH_INTEGRITY) are checked at deserialization (if available).

Note that you cannot store the integrity checksum and/or data structure version and omit the flag at deserialization because they affect where the actual data starts.

These values work as a bit mask.

Example:

constexpr auto const MODE = cista::mode::WITH_VERSION |
                            cista::mode::WITH_INTEGRITY |
                            cista::mode::DEEP_CHECK;

Serialization

The following methods can be used to serialize either to a std::vector<uint8_t> (default) or to an arbitrary serialization target.

  • std::vector<uint8_t> cista::serialize<mode Mode = mode::NONE, T>(T const&) serializes an object of type T and returns a buffer containing the serialized object.
  • void cista::serialize<mode const Mode = mode::NONE, Target, T>(Target&, T const&) serializes an object of type T to the specified target. Targets are either cista::buf<Buf> (where Buf can either be a simple std::vector<uint8_t> or a cista::mmap) or cista::file. Custom target structs should provide write functions as described here.

Deserialization

The following functions exist in cista::offset and cista::raw:

  • T* deserialize<T, cista::mode Mode = cista::mode::NONE, Container>(Container&) deserializes an object from a std::vector<uint8_t> or similar data structure. This function throws a std::runtimer_error if the data is not well-formed.
  • T* deserialize<T, cista::mode Mode = cista::mode::NONE>(CharT* from, CharT* to = nullptr) deserializes an object from a pointer range. This function throws a std::runtimer_error if the data is not well-formed.
  • reinterpret_cast<T>(ptr): Same as deserialize<T, cista::mode::CAST>. If you are using offset mode and the machine endian format is the same as the serialized one, you may as well just call call reinterpret_cast<T>(ptr).

Const variants:

  • T* deserialize<T, cista::mode Mode = cista::mode::NONE, Container>(Container const&) same as non-const variant above.
  • T* deserialize<T, cista::mode Mode = cista::mode::NONE>(CharT const* from, CharT const* to = nullptr) same as non-const variant above.

Note that there are requirements if the input is const: deserialization of raw pointers (e.g. in most cista::raw data structures except cista::array) as well as endian conversion are not supported as they require modification of the buffer. If you use offset mode and the deserialization does not require endian conversion, const inputs can be deserialized: they don't need to be modified.