Skip to content

Format Documentation

Robert Nix edited this page Mar 19, 2017 · 11 revisions

This page documents the internal structures of file data loaded as assets in the Unity engine. To understand what Unity does with these files, it is helpful to first know how Unity's Entity-Component system works, and to read the relevant public Unity documentation of serialization:

File Types

SerializedFile

A SerializedFile is essentially a collection of Objects. **.assets files are of this format. A SerializedFile is also the most important portion (CAB-...) of an AssetBundle file.

Header

The file begins with a 0x14 byte header. Values in this header are always stored big-endian.

struct SerializedFileHeader {
  int32 metadataSize;
  int32 fileSize;
  int32 version;
  int32 objectDataOffset;
  bool bigEndian;
  char _padding[3];
};
  • metadataSize: The size in bytes of the metadata section which immediately follows the header.
  • fileSize: The size in bytes of the entire SerializedFile.
  • version: This is the primary version indicator for this format, and "version" in this document refers to this value unless otherwise specified. (Note: in UnityPack, this is called "format")
  • objectDataOffset: The offset in bytes from the start of the file to the beginning of serialized object data.
  • bigEndian: When true, the remainder of this file has values which are stored big-endian. When false, the remainder of this file has values which are stored little-endian. Again, the values in the first 0x14 bytes of the file, i.e. this header, are always big-endian; the endianness of values in the rest of the file are determined by this boolean.

Metadata

The metadata section contains some variable-length strings. These are serialized with a null terminator; for example, "5.3.6p1" will be written 35 2E 33 2E 36 70 31 00, 8 bytes including the null terminator. There is no padding for alignment purposes.

This list is in order of serialization:

  • string generatorVersion: This string contains the engine version of the generator of this file, e.g. "5.3.6p1".
  • int32 platform: An enum value giving the platform this file was built for. See the RuntimePlatform enum for possible values.
  • When version is >= 13:
    • bool hasTypeTrees: whether TypeTree data has been included in this metadata
    • int32 numTypes: the number of types of objects in this serialized file
    • For each type in numTypes:
      • When version is >= 17:
        • int32 classID: 0x72 (114: MonoBehaviour) for script types
        • int8 ???
        • int16 ???
      • Otherwise (version < 17):
        • int32 classID: Negative for script types
      • If classID indicates a script type:
        • char scriptHash[16]
      • char typeHash[16]
      • If hasTypeTrees:
        • TypeTree typeTree (see TypeTree serialization below)
  • Otherwise (version < 13):
    • int32 numTypes
    • For each type in numTypes:
      • int32 classID: Negative for script types
      • TypeTree typeTree
  • Object info
  • External references
  • ...

TypeTree

The TypeTree describes how values in individual objects are serialized. It is a tree structure in which each node is a struct field. Object serialization can be performed by traversing this tree depth-first and reading or writing according to the information at each node.

Type and field names in the newer TypeTree format are stored not as strings directly, but as offsets into one of two string buffers: a global string buffer, defined as a constant string in the engine; and a local string buffer, included locally in the TypeTree. These string buffers contain null-terminated strings stored sequentially. When the offset value has bit 31 set (i.e., is negative), that bit is masked off to get an offset in the global string buffer; when bit 31 is not set, the offset is in the local string buffer.

  • When version == 10 or version >= 12:
    (new compact blob format)
    • int32 numNodes: Number of nodes in the tree
    • int32 stringBufferSize: Size in bytes of the local string buffer
    • For each node in numNodes:
      • int16 version: This field is never used by anything really.
      • int8 depth: The depth in the tree of the current node. Nodes of the tree are serialized depth-first, so this number will increase when the current node is a child of the previous node, will stay the same when the current node is a sibling of the previous node, and will decrease when the current node is a sibling of one of the previous node's parents.
      • bool array: When true, this node is a special array node -- its first child (size) in the tree is the size in elements of the array, and its next child (data) is serialized in a loop for each element of the array.
      • int32 type: The string buffer offset of the type name of this node.
      • int32 name: The string buffer offset of the field name of this node.
      • int32 size: The expected size in bytes when this node (including children) is serialized. This is -1 for variable-sized fields, such as arrays or structs that have arrays as children.
      • int32 index: This is just an index of the node in the flat depth-first list of nodes.
      • int32 flags: Flags for serialization and miscellaneous information.
        • 0x4000: the stream should be aligned after serializing this field
  • Otherwise (version != 10 and version < 12):
    (old format; fields are described above)
    • string type
    • string name
    • int32 size
    • int32 index
    • int32 array: This is still a bool (0 or 1), it's just 4 bytes as opposed to the 1 byte in the blob format
    • int32 version
    • int32 flags
    • int32 numChildren
      • For each child in numChildren, recurse starting at string type
        Note that this is explicitly listing children, as opposed to the blob format which just uses depth to determine child/parent relationships.

Asset Bundles

An AssetBundle file contains a SerializedFile containing Objects which are loaded dynamically by way of scripts. The SerializedFile in an AssetBundle container also has an AssetBundle Object, which contains a lookup from path name to individual objects in the bundle. For information on how this name lookup is usually created, see the Unity documentation.

Flat resource

*.resource and *.resS files are flat resource files, generally audio or texture data, and are viewed by Unity just as a sequence of bytes. In the case of audio data, the bytes are FMOD sound bank files that can be passed directly to FMOD to create a playable sound. In the case of texture data, the bytes are texture image data. The position and length of each segment of audio or image data are in the asset file of the same name, within individual AudioClip or Texture2D objects.

Clone this wiki locally