Skip to content

luggages/mus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

MUS Serialization Format

The MUS (Marshal, Unmarshal, Size) format, which by the way is a binary format, tries to be as simple as possible. You won't find field names or any other information in it besides values (with few exceptions, as 'nil' pointer flag and length for string, list and map data types). So, for example, an object of type Foo:

type Foo {
  a int
  b bool
  c string
}

in MUS format may look like this:

Foo{a: 300, b: true, c: "hi"}    MUS->    [216 4 1 4 104 105]
, where 
- [216 4] - is the value of the field a
- [1] - value of the field b
- [4] - length of the field  c
- [104 105] - value of the field c

Note that the object fields are encoded in order, first the first, then the second, then third, and so on.

If you need metadata, you can easily place it in your data type. A good example of this can be found in the Versioning section.

Format Features

  • All uint (uint64, uint32, uint16, uint8, uint), int, float data types can be encoded in one of two ways: using Varint or Raw encoding. The last one uses all the bytes that make up a number (in LittleEndian format), so Varint encoding gives an advantage in the number of used bytes, but is a bit slower. For example, the number 40 of type uint64 is encoded with only one byte:

    40    Varint->    [40]
    

    , the same number in Raw encoding will take as much as 8 bytes:

    40    Raw->    [40 0 0 0 0 0 0 0]
    

    In general, Raw encoding uses for:

    • int64, uint64, float64 - 8 bytes
    • int32, uint32, float32 - 4 bytes
    • int16, uint16 - 2 bytes
    • int8, uint8 - 1 byte

    And only for large numbers it becomes more profitable than Varint, both in speed and in the number of used bytes. These "large numbers", for uint types, are:

    • > 2^56 - for uint64
    • > 2^28 - for uint32
    • > 2^14 - for uint16
    • absent - for uint8, both encodings use one byte for uint8 numbers
  • ZigZag encoding is used for signed to unsigned integer mapping.

  • Strings and lists are encoded with length (int type) and values, maps - with length and key/value pairs.

    string = "hello world"    MUS->    [22 104 101 108 108 111 32 119 111 114 108 100]
    , where
    - [22] - is the length of the string
    - [104 101 108 108 111 32 119 111 114 108 100] - string
    
    list = {"hello", "world"}    MUS->    [4 10 104 101 108 108 111 10 119 111 114 108 100]
    , where
    - [4] - length of the list
    - [10] - length of the first elem
    - [104 101 108 108 111] - first elem
    -	[10] - length of the second elem
    - [119 111 114 108 100] - second elem
    
    map = {1: "hello", 2: "world"}    MUS->    [4 2 10 104 101 108 108 111 4 10 119 111 114 108 100]
    , where
    - [4] - length of the map
    - [2] - first key
    - [10] - length of the first value
    - [104 101 108 108 111] - first value
    - [4] - second key
    - [10] - length of the second value
    - [119 111 114 108 100] - second value
    
  • Booleans and bytes are encoded by a single byte.

    true    MUS->    [1]
    false    MUS->    [0]
    
  • Pointers are encoded with nil flag: nil pointer is encoded as 1, not nil pointer as 0 + pointer value.

    nil    MUS->    [1]
    , where
    - [1] - nil flag
    
    &"hello world"    MUS->    [0 22 104 101 108 108 111 32 119 111 114 108 100]
    , where
    - [0] - nil flag
    - [22] - length of the string
    - [104 101 108 108 111 32 119 111 114 108 100] - string
    

Versioning

MUS format does not have explicit versioning support. But you can always do next:

// Add version field.
type TypeV1 {
  version byte
  ...
}

type TypeV2 {
  version byte
  ...
}

// Check version field before Unmarshal.
switch buf[0] {
  case 1:
    typeV1, _, err = UnmarshalTypeV1(buf)
    // ...
  case 2:
    typeV2, _, err = UnmarshalTypeV2(buf)
    // ...
  default:
    return ErrUnsupportedVersion
}

Moreover, it is highly recommended to have a version field. With it, you will always be ready for changes in the type structure or MUS format.

Streaming

MUS format is suitable for streaming, all we need to know for this is the data type on the receiving side.

Serializers

Releases

No releases published

Packages

No packages published