-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of FramesManager with Ghost Frame Optimization #68
Conversation
Thoughts on the topic. Should to_bytes and from_bytes dump/read the entire object into/from bytes? Wouldn't it be better to take individual fields from it? Our results differ from the Rust implementation of this class now, even if compressor and data are the same, is this the desired behavior? |
Also, I am contemplating whether ghost frames actually enhance performance. I believe we need to conduct tests on this matter because I implemented this feature since it was in our weekly plan. However, from the project's perspective, how sensible is such a mechanic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going over your implementation (which is very solid) made me do some math and think if we can do better.
ConstantFrame without residuals (best case we have) will have 9 Bytes at worst case (can be as small as 2 Bytes).
The Frame manager will store, not only the Frame but also the Hashmap. The hashmap has always 16 bytes (!!). 8 for the hash, 8 for the pointer.
So, on the absolute best case, the Ghost Frame structure is between 2x and 8x the size of our best case. In other words, we would need to replace 3 to 9 frames per file for this to make sense.
My suggestion since we have a bench data done already. Run this branch against it, see what we get out of that, and do a data guided approach?
Let's keep this discussion going. I love the implementation simplicity.
@@ -94,18 +95,18 @@ mod tests { | |||
let mut cs = CompressedStream::new(); | |||
cs.compress_chunk_with(&vector1, Compressor::Constant); | |||
let b = cs.to_bytes(); | |||
assert_eq!(b, [66, 82, 82, 79, 0, 1, 41, 251, 0, 4, 3, 3, 0, 2, 0]); | |||
assert_eq!(b, [66, 82, 82, 79, 0, 1, 41, 251, 0, 4, 3, 3, 0, 2, 0, 1, 253, 184, 80, 252, 43, 206, 253, 150, 101, 41, 251, 0, 4, 3, 3, 0, 2, 0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it more than doubles our best data usage. Although our best case is EXCEPTIONALLY good, we must make sure this doesn't propagate (as, in, this is the cost of the static structure). Otherwise, the ghost frames might not bring a ton (or any decent benefit). If they do bring (I do still believe they do), and this space usage is creating issues, we might have to look into other structures.
Another option is we throw RKYV at this, it does support pointers well: https://docs.rs/rkyv/latest/rkyv/ (We would have to replace bincode that does a better job regarding data size). |
Closing since this was not improving performance but was an excelent step into pursuing other ways. |
Overview
As outlined in our project roadmap, this pull request introduces the fully implemented FramesManager. This module incorporates our anticipated Ghost Frames, which are designed to enhance memory efficiency by avoiding the storage of duplicate frame data.
Features
Implementation
The FramesManager is built around two core data structures:
When a new frame is added, we compute its hash and utilize the entry API for efficient handling of existing or new frames. If a frame with the same hash is found, we reuse the Arc pointing to that frame; otherwise, we create a new Arc and store it.
Test Details
The added tests validate various scenarios, encompassing both the compression and decompression processes, as well as the reference handling of frames:
test_constant_decompression_many_different_chunks: This test ensures that the decompression yields the correct output from multiple different compressed chunks.
test_many_same_chunks: Validates that the FramesManager correctly handles the addition of frames, expecting two separate frames when identical chunks are compressed.
test_constant_decompression_many_same_chunks: Checks that decompression is accurate when multiple identical chunks are compressed, ensuring the output is as expected.
ref_check_same_chunks: Uses Arc::ptr_eq to confirm that identical chunks result in the same Arc reference within the FramesManager, proving that the Ghost Frame logic is correctly identifying and managing duplicate frames.
ref_check_different_chunks: Confirms that different chunks do not share the same Arc reference, which would validate the FramesManager's ability to distinguish and store unique frames separately.