Replies: 4 comments 1 reply
-
Hi! I don’t see an issue with binrw itself in this report so I’m not sure I understand what is being requested here. Could you clarify what outcome you’re looking for? Was this meant to be opened as a Q&A discussion instead of a ticket? Let me know. Thanks! |
Beta Was this translation helpful? Give feedback.
-
This might be question pointing at a documentation gap, or it could be a limitation of the existing API. I'm not sure that it's fixable without some buffering or an API change; or if this is just a limitation a developer needs to keep in mind. The cookbook for validating checksums on read and calculating checksums on write both use the My implementation of a checksum (based on that cookbook) in my first comment follows that same pattern: whenever there's a read, call The That pattern makes an assumption, either:
I've got a highly-contrived example here. In it, I've put in some seeks and backtracking in there to simulate something like what One other way around it in my highly-contrived example would be to treat the |
Beta Was this translation helpful? Give feedback.
-
Thank you for your thoughts. I think there is a misunderstanding about the documentation that might be causing some confusion here. https://docs.rs/binrw is a reference guide, not a cookbook. Examples aren’t recipes; they are demonstrations to help authors understand how a feature might be used. They aren’t intended to cover every possible situation. The implementation of the checksumming stream in the documentation is deliberately elided because the goal is to show how Depending on what is covered by the MAC—i.e. whether or not whatever you’re seeking to during parsing is supposed to be included or not—you could just track the last hashed byte position in your stream implementation and only add bytes to the hash when the position being read matches, so non-sequential reads are ignored when hashing. You could also use wrapper types like: #[derive(BinRead)]
#[br(stream = s, map_stream = HashStream::new)]
struct Hashed<T> where T: BinRead {
inner: T,
#[br(calc(s.hash()))]
computed_hash: [u8; 32]
}
#[derive(BinRead)]
struct StoredHash<T> where T: BinRead {
stored_hash: [u8; 32],
#[br(assert(stored_hash == value.computed_hash))]
value: Hashed<T>,
} (Use And then avoid non-sequential reads by storing the offsets and lazy-loading or using some of the other helpers from the Or some combination of these approaches. However, if you are trying to do a verify-then-parse, you will never be able to avoid two passes since you can’t start parsing until you have verified the message. In this case, all of this is moot and you have no choice but to read the raw data once to calculate the hash, and then read it a second time when you are parsing. In any case, I don’t know of anything binrw can really do to make this easier since it doesn’t seem like you’re describing something that can be solved more easily than it already is in the generic case, and even if there was reasonable way of mapping a stream only for some fields and then consuming the stream to retrieve a value from it (I can’t think of a way to do this that wouldn’t be shitty, since whatever grammar is used needs to be compatible with a normal Rust struct grammar), that still won’t get you what you want when it comes to sequentially hashing bytes that aren’t being read sequentially. Let me know if this makes sense or you have any other questions. I’ll convert this to a discussion since it isn’t reporting a specific defect in binrw but is more asking a question about how to parse a particular data format which is what discussions are best for. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I tried to base my checksum handling using So the checksum is calculated over a subset of bytes, as the first variant is tried to parse. Then I think it would be worth adding a cautionary note to the documentation example for the checksum, because in all but the most trivial protocol, binrw is going to have to seek and it will make checksum calculation with this method impossible. Moreover, it has taken me hours to figure out why... |
Beta Was this translation helpful? Give feedback.
-
I'm implementing a reader and writer for an existing file format which is laid out something like this:
Based on the
map_stream
examples (which have a hash at the end of the file), I've ended up with this:This works, but are edge cases with this approach:
read()
time assumes that reads are always sequential from byte 32 onwards, and it is neverseek()
ed to a position after byte 32.write()
time also assumes that writes are always sequential from byte 32 onwards, and it is neverseek()
ed to a position after byte 32I think the
reader_var
test cases also makes the assumption of sequential reads, though it's got the checksum field at the end.Tangential to this, it'd be nice to be able to use this to verify a MAC, and do that verification before deserialising other data structures. However, this would require two read passes on a file, or to buffer the entire file (which could be large) in memory.
Beta Was this translation helpful? Give feedback.
All reactions