BenVoxel is an open standard using sparse voxel octrees to compress voxel model geometry for file storage (with optional metadata) developed by Ben McLean. This specification is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
The idea is to sacrifice processing speed to get a very small storage size for the geometry while keeping the implementation relatively simple and also allowing for extensive metadata to be optionally included. Because we assume that CPU and RAM are cheap but file storage / bandwidth is expensive, it should make sense to accept the complexity of a sparse voxel octree in order to get a smaller file size.
There will be no license requirements restricting usage, but this format is designed for small voxel models intended for video games, animations or other entertainment, artistic or aesthetic use cases, so its design might not be ideal for unrelated academic, scientific, medical, industrial or military applications. Also, be aware that none of this has been engineered to provide security features (such as anti-cheat, checksums or length constraints for overflow protection) so use at your own risk.
The BenVoxel standard describes two inter-related file formats. One is a binary format with the extension .ben
and the other is a JSON format (recommended extension .ben.json
) designed to contain all of the same information as the binary format but with the metadata kept human-readable. The JSON format uses Z85 encoding for the geometry data. A game developer might keep their voxel models in the JSON format during development but automatically convert to the binary format (potentially stripping out metadata) as part of their release pipeline.
The BenVoxel standard adopts the MagicaVoxel Z+up right-handed 3D coordinate system where: X+ is right/east (width), Y+ is forward/north (depth), and Z+ is up (height). 0, 0, 0 is the bottom left nearest / southwestern corner of the model. Negative coordinates cannot contain voxels. Models are expected to be aligned so that their lowest edge occupies coordinate value 0 on all three axes.
The empty string key has special meaning in several contexts:
- In models: Indicates the default model.
- In palettes: Indicates the default palette.
- In points: Specifies the model origin.
- In properties: Specifies the voxel scale.
When converting between JSON and binary formats:
- Keys
- JSON keys map directly to
KeyString
fields in binary format. - Empty string keys retain their special meanings across formats.
- JSON keys map directly to
- Geometry Data
- Binary: Direct octree bytes.
- JSON: DEFLATE (RFC 1951) compressed, Z85-encoded octree bytes.
- Empty string keys must retain their special meaning.
- Size constraints are mandatory for compatibility.
- Out-of-bounds voxels are invalid and may be discarded without warning during reading as an implementation detail.
- Files containing out-of-bounds voxels should still be readable if otherwise valid.
- Key strings longer than 255 characters are invalid. Key strings that start or end with whitespace are invalid. However, it is recommended as an implementation detail to first trim whitespace, then truncate the keys with last-in-wins dictionary behavior.
- Implementations are required to tolerate any unused 0-padding at the end of decompressed geometry data.
A JSON schema for documentation purposes only (not providing validation for security) is included in the file benvoxel.schema.json
. All the objects and properties in the JSON format correspond directly to chunks or fields in the binary format.
The JSON objects map to binary chunks as follows:
JSON object | Binary chunk |
---|---|
Root | BENV |
metadata |
DATA |
models |
MODL |
properties |
PROP |
points |
PT3D |
palettes |
PALC |
geometry |
SVOG |
Both the binary and JSON formats include version information. In the binary format, this is a field in the BENV
chunk. In the JSON format, this is in a root key called version
. The version should be compared alphanumerically as a string, with higher values indicating newer versions.
Implementations should rely on the version
property/field within the file for determining BenVoxel format feature support, not the schema version.
The binary format was inspired by the classic RIFF structure.
All types are little-endian.
Three string types are used:
FourCC
: 4 byte ASCII chunk identifiers.KeyString
: Starts with one unsigned byte for length, followed by a UTF-8 string of that length. Empty string is valid, butKeyString
s that begin or end with whitespace are invalid. Duplicate (identical / non-unique) keys within the same sequence are invalid. It is recommended that implementations handle duplicate keys with last-in-wins dictionary behavior.ValueString
: Starts with an unsigned 32-bit integer for length, followed by a UTF-8 string of that length.
All chunks have:
FourCC
, 4 bytes: an ASCII identifier for this chunk (examples are "FMT
" and "DATA
"; note the space in "FMT
").ChunkLength
, 4 bytes: an unsigned 32-bit integer with the length of this chunk (except this field itself and the chunk identifier).ChunkData
, variable-sized field: the chunk data itself, of the size given in the previous field.
This applies to all chunks, so this information won't be repeated in the individual chunk type descriptions.
BenVoxel binary files start with a BENV
chunk which contains the entire file and corresponds to the root object in the JSON format. It contains:
Version
: OneKeyString
for version information. Higher alphanumeric comparison indicates higher version.- The remaining data is compressed using raw DEFLATE (RFC 1951) and contains:
Global
: OneDATA
chunk for global metadata. (optional)Count
: One unsigned 16-bit integer for the number of models.- For each model:
Key
: OneKeyString
for the model key. Empty string key indicates the default model.Model
: OneMODL
chunk.
The size of the compressed data can be determined by subtracting the size of the Version field (the content of its unsigned 1-byte length field plus 1) from the BENV
chunk size.
The global metadata chunk must be ommitted if it is empty. The purpose of having global metadata as an option is in order to keep the model metadata DRY, so essentially, all the global metadata should be treated as if it had been included in every model's metadata except when an individual model's metadata contains an identically-named key to override it.
Corresponds to the metadata
key in the JSON format. It contains:
Properties
: OnePROP
chunk. (optional)Points
: OnePT3D
chunk. (optional)Palettes
: OnePALC
chunk. (optional)
Empty child chunks must be ommitted. Empty DATA
chunks (where all three child chunks are empty) must be ommitted from their parent chunk.
Corresponds to one of the models
objects in the JSON format. It contains:
Metadata
: OneDATA
chunk. (optional)Geometry
: OneSVOG
chunk.
The metadata chunk must be ommitted if it is empty.
Key-value pairs for arbitrary metadata.
Corresponds to one or more of the properties
objects in the JSON format. It contains:
Count
: One unsigned 16-bit integer for the number of properties.- For each property:
Key
: OneKeyString
for the property key. Empty string key, if present, specifies the scale in meters of each voxel. This can be either a single decimal number applied to all dimensions (e.g.1
for Minecraft-style 1m^3 voxels) or three comma-separated decimal numbers for width, depth, and height respectively (e.g.2.4384,2.4384,2.92608
for Wolfenstein 3-D walls which are 8ft x 8ft x 9.6ft). Scale values must be positive decimal numbers in C# decimal format. If no empty string key is present then the scale is unspecified.Value
: OneValueString
for the property value.
Named 3D points in space as [x, y, z]
arrays. Uses 32-bit signed integers to allow points to be placed outside model bounds (including negative coordinates) for purposes like specifying offsets.
Corresponds to one or more of the points
objects in the JSON format. It contains:
Count
: One unsigned 16-bit integer for the number of points.- For each point:
Key
: OneKeyString
for the point key.Coordinates
: Three signed 32-bit integers for the X, Y and Z coordinates.
Empty string key specifies the origin of the model. The default origin is defined as [width >> 1, depth >> 1, 0]
. (the bottom center) If the origin is equal to the default then this key should be omitted.
Named color palettes.
Corresponds to one or more palettes
objects in the JSON format. It contains:
PaletteCount
: One unsigned 16-bit integer for the number of palettes.- For each palette:
Key
: OneKeyString
for the palette key. Empty string key indicates the default palette.ColorCount
: One unsigned byte representing the number of colors minus one, with a range of 0-255 representing 1-256 colors. A value of0
indicates 1 color, and a value of255
indicates 256 colors. This range always includes the background color at index zero while the rest of the indices correspond to the voxel payload bytes which is the reason for the 256 color limit.Colors
: Each color has four bytes of Red, Green, Blue and Alpha from left to right, so the length will be(ColorCount + 1) << 2
bytes.HasDescriptions
: One unsigned byte with value0
to indicate no descriptions xor any other value to include descriptions.Descriptions
: A series ofColorCount + 1
ValueString
s describing the colors. Only included ifHasDescriptions
is not0
. A description should stay associated with the color it describes even when the colors or their order changes. The first line should be a short, human-readable message suitable for display as a tooltip in an editor. Additional lines can contain extra data such as material settings, which editors should preserve even if they don't use it.
Stands for "Sparse Voxel Octree Geometry". Corresponds to a geometry
object in the JSON format. It contains:
Size
: Three 16-bit unsigned integers defining model extents on X, Y, and Z axes. Valid voxel coordinates range from0
tosize - 1
for each axis. Any geometry data present at coordinates equal to or greater than the corresponding size value is invalid and may be retained or safely discarded without warning as an implementation detail. For example, in a model of size[5,5,5]
, coordinates[4,4,4]
are valid while coordinates[5,4,4]
are out of bounds. Selectively discarding out-of-bounds voxels when deserializing is recommended but not required. However, it is also strongly recommended that files containing such out-of-bounds voxels which are otherwise valid should still be readable.Geometry
: A variable length series of bytes which encodes the voxels according to the "Geometry" section of this document.
Both the JSON and binary formats use the same sparse voxel octree data format, except that only for the JSON format, serializing the geometry data requires the following additional processing steps:
- The sparse voxel octree data is first compressed using raw DEFLATE. (RFC 1951)
- The compressed data is then encoded using Z85 (ZeroMQ Base-85) to ensure valid JSON characters. This includes automatically padding the end of the data with zeroes to make the length a multiple of 4 as a requirement of Z85 encoding. All implementations are required to tolerate having these extra zeroes optionally present at the end of the geometry data even though there is no need to add this padding in the binary format.
- Finally, the compressed and encoded string is stored in the
z85
property.
Deserializing from the JSON format reverses the process:
- First, the string value of the
z85
property is decoded back into compressed binary data. - The compressed binary data is then decompressed using raw DEFLATE. (RFC 1951)
- Finally, the decompressed binary data is read to deserialize the sparse voxel octree.
This ensures that the non-human-readable section of the JSON files is compressed to a minimum length while only using JSON-safe characters. Due to typical DEFLATE implementations being pseudo-non-deterministic across platforms, the compressed and encoded string is not guaranteed to match between two runs with the same uncompressed and unencoded input data. However, the decoded and decompressed output data will be binary identical to the (padded) input data because DEFLATE data compression is lossless.
Unlike the JSON format, the binary format uses the raw octree data directly without any additional compression or encoding applied to it. The binary format prefixes the model size to the geometry data in the SVOG
chunk while the JSON format includes the model size in a separate size
property instead, omitting the model size from the compressed and encoded z85
property.
An individual voxel is defined as having four data elements.
The first three data elements of a voxel are 16-bit unsigned integers for the X, Y and Z coordinates. Negative coordinates and coordinates larger than 65,534 are unsupported.
The fourth data element of a voxel is the payload of one byte for the index to reference a color or material, where 0 is reserved for an empty or absent voxel, leaving 255 usable colors or materials.
Models (sets of voxels with unique coordinates) are limited by 16-bit unsigned integer bounds, so valid geometry can range from coordinate values of 0 to 65,534 inclusive. Models are expected to follow the coordinate system defined above.
To serialize a model, geometry is structured as a sparse voxel octree for compression, so that the coordinates of the voxels are implied from their positions in the octree and empty spaces are not stored.
The octree has a fixed depth of 16 levels, corresponding to the 16 bits of addressable space in the unsigned 16-bit integer spatial coordinates. The first 15 levels consist of only Branch nodes, while the 16th and final level contains only Leaf nodes.
There are four node types, including two types of branches and two types of leaves.
All nodes start with a 1-byte header, composed of bits from left to right:
- Header bit 7: Node type indicator
0
: Branch node1
: Leaf node
- Header bit 6: Branch/Leaf subtype
- For Branch nodes:
0
: Regular branch with child nodes1
: Collapsed branch containing identical values
- For Leaf nodes:
0
: 2-byte payload Leaf1
: 8-byte payload Leaf
- For Branch nodes:
- Header bits 5-3:
- For regular Branch nodes: number of children (1-8) minus one
- For 2-byte Leaf nodes: ZYX octant of the foreground voxel
- For other node types: unused
- Header bits 2-0: ZYX octant indicator
- These bits encode the node's position relative to its parent in ascending Z, Y, X order
0
represents the negative direction and1
represents the positive direction for each axis- Examples:
000
: (-Z, -Y, -X) octant111
: (+Z, +Y, +X) octant001
: (-Z, -Y, +X) octant
Branch nodes come in two forms:
- Regular branches contain up to eight child nodes. Header bits 5-3 indicate the number of children minus one. The header byte is followed by the child nodes, each with a specific octant position corresponding to its location in the parent's 2x2x2 grid using the same ZYX octant encoding described above.
- Collapsed branches abbreviate full subtrees of cubes where all descendant leaves contain the same non-zero value. The header is followed by a single byte specifying this value. If the value would be zero, the node should not exist at all since it represents entirely empty space.
On the 16th (last) level of the octree, all children will be Leaf nodes, but the children will all be Branch nodes on every other level.
Leaf nodes represent the contents of a 2x2x2 voxel cube and come in two forms:
These represent cubes in which all the voxels are the same except one, called the foreground voxel. The header byte (with the foreground voxel's position encoded in bits 5-3) is followed by two payload bytes: the first for the foreground voxel and then the second for all background voxels. Either the foreground or background value may be zero, representing empty space in those positions, but in the case that both would be zero, the leaf should not exist because it would only contain empty space. The only exception would be to indicate a completely empty model.
The octant/position of the foreground voxel is indicated by coordinates in header bits 5-3, using the same ZYX octant encoding scheme as described in the Node Headers section above.
The background voxel value should be repeated in all octants/positions except for the foreground voxel.
These represent cubes of any arbitrary values. In 8-byte Leaf nodes, the header byte is followed by the eight payload bytes of a 2x2x2 voxel cube in ascending Z, Y, X order from left to right, with 0 representing empty voxels.
An empty model is represented by the following 18 bytes in hexadecimal:
- 15 bytes of
00
for branch node headers - 1 byte of
80
for a 2-byte payload Leaf header - 2 bytes of
00
for the payload (both foreground and background values zero)
The 15 branch node headers correspond to the 15 levels of the octree needed to address a space with 16-bit integer coordinates (2^16 = 65,536, except remember to subtract one for zero-based indexing). The 16th level corresponds to the leaf node containing all empty space.
An MIT-licensed reference implementation is provided as a .NET Standard 2.0 library written in C# 13.0 via PolySharp.
Package | Liscense | Included Via |
---|---|---|
Cromulent.Encoding.Z85 |
MIT | NuGet |
PolySharp |
MIT | NuGet |
BenVoxelFile benVoxelFile = BenVoxelFile.Load("filename.ben");
benVoxelFile.Save("filename.ben");
BenVoxelFile benVoxelFile = BenVoxelFile.Load("filename.ben.json");
benVoxelFile.Save("filename.ben.json");
foreach (Voxel voxel in benVoxelFile.Models[""].Geometry)