-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability of only reconstructing certain sub-chunk of certain shards #79
Comments
I think the following notes which I thought of as I revisited the code briefly might be worth mentioning:
It might be easiest to modify Either that, or adjust the And I might be way off as well - I haven't read the code base in forever. |
At least currently we are only need to reconstruct some data shard so
maybe the interface for now just alllow asking for those. so maybe for
now I just implment
```
reconstruct_some_data_shards
```
instead. @burges thoughts?
|
We're interested in the "perpendicular" situation with partial shards: We've an k-of-n encoding of a message of size m. We're using GF(2^16) since n>256, so each R-S polynomial contains exactly 2k bytes of data. As m >> 2k we necessarily split the message up across several R-S polynomials, meaning we rounded m up to some m' such that 2k | m'. We want the data between indices u and v. We let u' be u rounded down so that 2k | u' and let v' be v rounded up so that 2k | v'. There are now specific R-S polynomials that contain the data between indices u' and v', so we fetch k shares of each of those R-S polynomials, but only those. We then reconstruct only the data between indices u' and v', and return the requested subslice between indices u and v. We ran into an interesting authentication hiccup with this approach however: We authenticate each retrieved shard using Merkle tree proofs. We must hash shards using a deeper Merkle tree with leaves of size exactly 2k, which requires exposing an annoying amount from this crate. We could sign these partial u' to v' shards instead of refining the Merkle proof, and then punish invalid partial shards, but doing k signatures maybe sucks here. It's harder but maybe more useful to implement a BCH flavor of R-S codes that actually does error correction, not sure about all the trade-offs though. Berlekamp–Massey decoder? Euclidean decoder? etc. We favor authentication via Merkle proofs over doing correction usually because error correction requires downloading twice the error rate, making Merkle proofs cheaper. We're addressing a narrower threat model here though, so maybe error correction with BCH suffices, not sure. We'll could just reconstruct from partial shards optimistically, check the partial reconstructed data's hash against another source, and then do a full reconstruction if this fails. Appologies for rambling there.. |
I'd rather opening a new issue for any error correcting related matter,
and move the discussion about error correction there.
But for the matter of partial recovery I understood that I
misunderstood the requirement when I originally created the ticket.
TL;DR is that not only we want to limit the reconstruction to a specific
shard and not the whole data but also only specific sub-chunk of that
shard and we only have the corresponding chunk from the other data and
parity shards to construct it.
So assuming for all shard bytes [0,..,n-1] and [n+m+1,shard_length] are
missing we only have bytes [n,m] from more than k shards (k being
construction threshold), perhaps because we only asked for that much
data because of network bandwidth constrain.
I'm going to edit the ticket title to reflect the actual requirement.
|
Okay, that's interesting. Is the sub chunk always contiguous? |
Yes. It's mostly an authentication question: How much do we want to make hashing and merkrle proofs overlap with the erasure coding? I suppose a shard type could expose an iterator over data in the same polynomial, which I guess is ever even index. |
We have situations that we have recovered some data and parity shards but we are only interested in recovering a specific subset of the shards instead of the whole encoding mainly due to performance issues. As such it is useful to have
reconstruct_subset_of_shards
function. I'm working on it and opened this issue to discuss and track implementation detail etc.The text was updated successfully, but these errors were encountered: