-
Notifications
You must be signed in to change notification settings - Fork 121
Data Structure Versioning
Cista optionally supports data structure versioning by automatically computing a type hash for the serialized data structure. This enables to check whether the binary buffer loaded from file, network, database, etc. has the expected structure. The type hash serves as a data structure version that automatically changes when the data structure becomes binary incompatible.
Data structure versioning is by no means a replacement for security checking when reading from untrusted data sources.
Note, that changes that do not affect the binary layout, like swapping the names of two consecutive member variables of the same type, cannot be detected. Hashing a type structure recursively is done at runtime.
The following example illustrates how to use it:
#include "cista.h"
struct s1 { int i; int j; int k; }
struct s2 { int i; int j; }
int main() {
constexpr auto const MODE = cista::mode::WITH_VERSION;
s1 obj {1, 2};
auto serialized = cista::serialize<MODE>(obj);
// Note: this throws because s1 was serialized but we try to read s2
// Security checks would not throw because sizeof(s2) <= sizeof(s1).
auto const deserialized = cista::deserialize<s2, MODE>(serialized);
}
It is important to use the same mode for deserialization that was used for serialization. The data structure version is a 64bit value that precedes the actual data. If the serialization mode and deserialization mode do not match, the offset where the serialized data starts will be wrong. The recommended style is to introduce a constexpr
variable to store the mode (see example).
The type hash is computed by recursively iterating the structure of the serialized data structure (e.g. using cista::for_each_field
for structs, hashing T
for vector<T>
, etc.) and hash combining all involved type names (and some extra strings for unambiguity).
The data structures can have circles (e.g. a graph: nodes have edges, edges have nodes) which would result in infinite recursion. Therefore, the computation keeps a map std::map<hash_t, unsigned>&
which stores the types already hashed. The key (hash_t
) is the hash of the type name and the value is the unique order index this type was discovered at.
Since Cista version 0.5, raw pointers store a relative offset in the serialized format. This makes them binary compatible to offset pointers. This is reflected by the type hash: a data structure has the same type hash regardless of which pointer type (cista::offset::ptr<T>
or cista::raw::ptr<T>
) is used. Thus, switching from namespace data = cista::raw
to namespace data = cista::offset
or the other way around does not require to re-generate the serialized binary.
The generic type hash function contained in Cista works for all types, the serialization works for: standard layout, non-polymorphic aggregate types. For all structs with custom constructors, inheritance, etc. a custom type hash needs to be implemented when using cista::mode::WITH_VERSION
.
To support type hashing for my_type
, you can either use the cista_members
approach as described in the Chapter about custom serialization functions or implement the following function:
hash_t type_hash(my_type const& el, hash_t h, std::map<hash_t, unsigned>& done);
Paramters:
-
el
an instance of your type -
h
the current hash (seed) -
done
map of discovered hashed types - do not touch if your type is not cyclic (e.g. graph: edge has nodes, node has edges). Pass this on to subsequent calls totype_hash<T>
.- key: hash of the type name (see
type2str_hash
) - value: the discover order index (hash combine with this unique index if you see the type again instead of trying to hash the whole type again)
- key: hash of the type name (see
Return value: the hash of this type.
The type hash functions already implemented in Cista and the reference of Cista's hashing functions below my be helpful.
template <typename T>
constexpr hash_t hash_combine(hash_t const h, T const val);
Combines a given hash h
with another hash or integer value including char
and unsigned char
.
hash_t hash(std::string_view s, hash_t h = BASE_HASH);
Hashes the given string s
. The seed h
defaults to the BASE_HASH
. Setting h
is a hash_combine
with h
.
template <size_t N>
constexpr hash_t hash(const char (&str)[N], hash_t const h = BASE_HASH);
Hashes the given char array. Example: hash("my string")
.
template <typename T>
constexpr uint64_t hash(T const& buf, hash_t const h = BASE_HASH);
Hashes a given buffer (e.g. a std::vector<char>
, std::string
, etc.).
template <typename T> hash_t type2str_hash();
Hashes the type name of the given type T
. Example: type2str_hash<int>()
.