- TMS – Time series compression library
- Summary
- Key Features
- Technical Highlights
- Skills Demonstrated
- Current Project State
- Project state
- Rust modules, struct, and methods
- Explaining the design
- About compression
- Differential encoding compression
- Binary packing compression
- Variable integer compression
- SIMD instructions
- The
tms
compression algorithm - Three-character naming
- Figure style and tool
TMS is a specialized time series compression library written in Rust, designed specifically for financial market data with a focus on optimal performance through SIMD (Single Instruction Multiple Data) operations.
High-performance time series compression library in Rust
- Designed for financial data with millisecond precision
- Optimized for fast decompression using SIMD instructions
- Implements advanced compression techniques including differential encoding, binary packing, and variable integer compression
- Supports both in-memory caches and on-disk storage
- Based on vectorization research by Daniel Lemire et al.
- Currently supports 256-bit SIMD vectors on x86 chips
- Includes features for compressing and decompressing date-time series data
tms
= Time Series
tms
is based on the work described in:
- "Decoding billions of integers per second through vectorization" by Daniel Lemire and Leonid Boytsov.
- "SIMD Compression and the Intersection of Sorted Integers" by Daniel Lemire, Leonid Boytsov, and Nathan Kurz.
- High-Performance Compression
- Uses SIMD instructions (256-bit vectors) for fast compression/decompression
- Implements differential encoding with binary packing for efficient storage
- Employs variable integer (varint) compression for additional space savings
- Financial Market Optimization
- Specifically designed for NYSE trading hours (9:30 AM - 4:00 PM ET)
- Optimizes storage by focusing on market hours, improving compression ratios
- Supports millisecond-precision timestamps
- Technical Implementation
- Uses a hybrid compression approach:
- Differential encoding to reduce value sizes
- Binary packing with SIMD for parallel processing
- Variable integer compression for remaining values
- Block-based processing with 256-element blocks for SIMD efficiency
- Clean API with main struct
TmeMli
for handling datetime sequences
- Architecture
- Two main modules:
mcr
: Provides procedural macros for generating compression functionstms
: Implements the core compression algorithms and data structures
- Uses advanced Rust features like SIMD, unsafe blocks, and procedural macros
- Compression Algorithm
- Converts 12-byte NaiveDateTimes to compressed u32 representations
- Segments data into 256-element blocks for SIMD processing
- Uses differential encoding within blocks
- Applies binary packing with dynamic bit widths
- Falls back to varint compression for partial blocks
- Performance Optimizations
- SIMD-accelerated operations for parallel processing
- Efficient memory layout for fast access
- Optimized bit-packing for minimal storage overhead
- Fast decompression prioritized in design
- Rust Expertise
- Advanced language features (SIMD, unsafe, procedural macros)
- Systems programming and memory management
- Performance optimization techniques
- Algorithm Design
- Compression algorithm implementation
- SIMD vectorization
- Binary data manipulation
- Financial Domain Knowledge
- Understanding of market data requirements
- Trading hours optimization
- Time series handling
- Software Engineering
- Clean API design
- Comprehensive testing
- Performance benchmarking
- Documentation
- Core functionality implemented and tested
- Some features pending implementation (floating-point compression)
- Active development with clear roadmap
- Comprehensive test coverage for existing features
This project showcases advanced Rust programming skills, understanding of low-level optimization techniques, and domain knowledge in financial data processing. It demonstrates the ability to work with complex algorithms while maintaining clean, maintainable code.
tms
is a work in progress. Some tests work. Some bugs are to be resolved. Some features are to be built.
The intent to offer existing work before a final version, and not wait until all details are resolved. I appreciate when others do the same. I learn from people who are sharing their progress, and hope you will too.
256-bit SIMD vectors on x86 chips are currently supported.
Project tasks:
- Date-time series compression and decompression
- 256-bit SIMD instructions
- Varint compression and decompression
- Some unit tests
- Evaluate floating point compression approaches
- Implement a floating point compression approach
- Refine public API functions for client usability
- Extensive unit tests
- Publish to the crates.io registry
The tms
crate is composed of the mcr
and tms
modules.
Figure 1. Modules in the
tms
crate.
The mcr
module, an acronym for "macro", provides procedural macros which generate compression functions. mcr
generates tms
.
The tms
module, an acronym for "time series", provides generated compression functions. tms
is meant for use by client libraries. Tests are in the tms
module, and are manually written.
The TmeMli
struct, an acronym for "Time Millisecond", represents a sequence of date-times with millisecond precision. TmeMli
compresses date-times for multiple days.
Values are compressed by day with the append_day
method. Values are accessible by day with the get_day
method. append_day
accepts an uncompressed list of NaiveDateTimes. get_day
returns the uncompressed list of NaiveDateTimes. Internally, NaiveDateTimes are stored as a list of compressed bytes.
Different types of compression are explained, and then combined in a final explanation of the tms
compression algorithm.
Compression can be explained with the analogy of a box with a gift in it. The box is a data type, and the gift is the actual data that's used.
Figure 2. An analogy of a data type as a box, and the gift in the box as the data that's used.
The gift in the box is what we want. Computer chips are physically etched to work with convenient-sized boxes. The box is convenient. It works. And we carry it around.
The gift in the box is so useful, we decide to do more things with more gifts.
Figure 3. Three boxes, each with gifts.
We start to notice that carrying and using more boxes isn't quite as convenient as when we had just one box. We look into each box and see that the gifts don't take up all the space of the boxes.
Why don't we just use the gifts?
- The used data, the gift, might sometimes expand into the whole box.
- The processor is physically designed to use boxes of a certain size.
We could use smaller boxes, if we have the option. Computers can use different sized boxes. If we understand the context of the data well enough, and know that we never need a large box, the simplest solution is to choose a smaller box. With smaller boxes, it's also possible that space is still left over in each box. Often the size of each box has to be as large as the largest gift.
In the case where we're given boxes from someone else, we may not have the option to choose a smaller box. For example, using a timestamp or 64-bit floating point value.
So another option can be to use one box for three gifts. That would make it easier to carry the gifts around. That's compression.
Figure 4. An analogy of compression as one box with three gifts.
To have three gifts in one box, each gift has to be moved from their original boxes. It takes time to move the gifts for the convenience of carrying one box.
Inside the computer processor, to use each gift, we have to move the gifts back into individual boxes before using them. It's effort to program. And it's effort for the processor to move gifts.
Figure 5. An analogy of compression as moving gifts between individual boxes and one box.
So, is it worth compressing data with having to move gifts around? It depends on the context and data. For time series, the answer is yes. The changes between points of a time series are often small enough to enable compression in a reasonble amount of time.
Differential encoding subtracts one point from another. Instead of remembering 1,000,000
, and 1,000,003
, we just remember 3
. We repeat that for pairs of points. It's easier for both humans and computers to just remember 3
. The requirement is that data be some sequential integers. The two points might be close neighbors; or, one point may be some distant starting point. An unmodified starting point is preserved to enable future decompression.
Figure 6. Differential encoding subtracts one point from another.
Binary packing is moving gifts next to one another into a single box. The gifts can be any size, and aren't necessarily the pre-defined sizes the computer likes to work with. The destination size can be the size of the largest gift.
Figure 7. An analogy of binary packing compression as placing gifts next to each other in a box.
Binary packing uses bit-shift operations to move the gifts in and out of boxes. Bit-shifting is fast for a computer.
Variable integer compression, also known as varint
, moves a gift to a box that has a pre-defined size that's just big enough. The computer usually works with pre-defined box sizes of 1
,2
,4
, and 8
bytes. If an integer uses one byte within a four-byte box, preserve just one byte.
Figure 8. An analogy of variable integer compression as moving a gift to a smaller box with a pre-defined size.
varint
is relatively short to program. Ten lines of code are used in the tms
compress function usize_pck
.
varint
is used in the gRPC remote procedure framework.
SIMD instructions make data processing fast. SIMD means "single instruction, multiple data". One processor instruction does one thing on mutliple pieces of data, instead of doing one thing on one piece of data.
SIMD programming takes additional effort, and can be applied to differential encoding and binary packing.
The intent is to compress a series of date-times into a small size, while preferring fast decompression. NaiveDateTimes are shaped into forms which are readily used by SIMD instructions.
Times are limited between the hours of 9:30am
to 4:00pm
, which corresponds to a NYSE Core Trading Session. This choice provides improved compression size, and supports SIMD use with u32s. The 6.5 hour day limits a maximum millisecond, and maximum quantity of milliseconds to 23,400,000.
1,000
milliseconds *60
seconds *60
Minutes *6.5
hours =23,400,000
milliseconds per day
25 bits can represent any millisecond within the 6.5 hour range, 2^25 = 33,554,432
. In practice, 4-byte unsigned integers are used during intermediate compression steps, 2^32 = 4,294,967,296
.
Differential encoding subtracts the date and 9:30am hour from each date-time. The internal representation of each time becomes smaller. The list of 12-byte NaiveDateTimes are converted to a list of 4-byte u32s. One 32-bit integer stores the date part, and other 32-bit integers store milliseconds from the start of the day.
Figure 9. A list of 12-byte NaiveDateTimes compressed to a list of u32s using differential encoding.
The list of u32s is then segmented into blocks to support SIMD instructions. Each block has 256 elements. And the last block may have a variable length, depending on the data.
Figure 10. A list of u32s, represented as milliseconds, are segmented into blocks of 256 elements. The last block may have a variable number of elements.
Applying another round of differential encoding at the block level improves compression size. The additional step is reasonable with SIMD processing being fast.
A SIMD instruction subtracts one list from another. Each list has eight unsigned integers. The first list of eight integers is subtracted from consecutive lists within the block. The first list of integers is stored uncompressed. Remaining delta values are stored.
Figure 11. Subtracting two lists of unsigned integers with a SIMD instruction. Unsigned integers are represented as milliseconds. Elements are subtracted at corresponding indexes.
While the milliseconds are sequential, they can be randomly distributed. Notice that the differences are not sequential.
The differential encodings within a block is scanned for a maximum. The maximum is set as the box size for all values in the block. A list of unsigned integers are bit-shifted with SIMD instructions.
Figure 12. Moving bits into smaller boxes with SIMD bit-shifting.
A single box size per block supports SIMDs ability to do one thing on many pieces of data. The box size varies from block-to-block for to improve compression size while allowing decompression speed.
Any variable length block at the end of the list is compressed using varint
. A variable length list may or may not exist, depending on the data. The varint
compression doesn't use SIMD.
The binary packed bytes representing a day are appended to the end of a buffer with previous days. Various bookkeeping variables track the total number of days, the byte index of each day, and other items.
tms
uses three and four character identifiers for modules (mcr
, tms
), structs (TmeMli
, Itm
, BlkItr
), and functions (gen_blk
, u32_pck
, u32x256_unp
).
It's a style with a logic.
It fits well in the mind, once adjusted to. It composes larger concepts with less cognitive overload for human parsing and reasoning. It's a style seen in C, assembly, and Rust.
It's a compressed word form that can be expanded, if more detailed explanation is needed. It allows for the accumulation and juggling of many parts more quickly and succintly.
When I read gen_blk
, internally I'm translating and understand it means "Generate Block". Some writers choose generate_block
. Equally valid. The style also depends on the type of engineering and audience. For a low-level Rust library using SIMD and bit-shifting, three-character naming is helpful.
Figures were created with the Material 3 visual design system, Figma, and M3 design kit.