Add the ability of only reconstructing certain sub-chunk of certain shards #79

drskalman · 2020-08-25T11:35:20Z

We have situations that we have recovered some data and parity shards but we are only interested in recovering a specific subset of the shards instead of the whole encoding mainly due to performance issues. As such it is useful to have reconstruct_subset_of_shards function. I'm working on it and opened this issue to discuss and track implementation detail etc.

The text was updated successfully, but these errors were encountered:

darrenldl · 2020-08-25T11:52:38Z

I think the following notes which I thought of as I revisited the code briefly might be worth mentioning:

If the selected subset strictly concerns only with data shards, then the code can just reconstruct the required subset
But if the selected subset contains any parity shard, then it should implicitly/automatically include all missing data shards in the target set as well (as required for construction of parity shards, I think)

It might be easiest to modify code_some_slices to accept an "ignored indices" argument.

Either that, or adjust the outputs parameter (https://github.com/darrenldl/reed-solomon-erasure/blob/master/src/core.rs#L473) to allow for encoding of an index being skipped, otherwise shrinking the outputs directly would lead to code_single_slice misusing i_row (https://github.com/darrenldl/reed-solomon-erasure/blob/master/src/core.rs#L487)

And I might be way off as well - I haven't read the code base in forever.

drskalman · 2020-08-25T14:58:42Z

At least currently we are only need to reconstruct some data shard so maybe the interface for now just alllow asking for those. so maybe for now I just implment ``` reconstruct_some_data_shards ``` instead. @burges thoughts?

burdges · 2020-08-25T22:06:07Z

We're interested in the "perpendicular" situation with partial shards:

We've an k-of-n encoding of a message of size m. We're using GF(2^16) since n>256, so each R-S polynomial contains exactly 2k bytes of data. As m >> 2k we necessarily split the message up across several R-S polynomials, meaning we rounded m up to some m' such that 2k | m'.

We want the data between indices u and v. We let u' be u rounded down so that 2k | u' and let v' be v rounded up so that 2k | v'. There are now specific R-S polynomials that contain the data between indices u' and v', so we fetch k shares of each of those R-S polynomials, but only those. We then reconstruct only the data between indices u' and v', and return the requested subslice between indices u and v.

We ran into an interesting authentication hiccup with this approach however:

We authenticate each retrieved shard using Merkle tree proofs. We must hash shards using a deeper Merkle tree with leaves of size exactly 2k, which requires exposing an annoying amount from this crate. We could sign these partial u' to v' shards instead of refining the Merkle proof, and then punish invalid partial shards, but doing k signatures maybe sucks here.

It's harder but maybe more useful to implement a BCH flavor of R-S codes that actually does error correction, not sure about all the trade-offs though. Berlekamp–Massey decoder? Euclidean decoder? etc. We favor authentication via Merkle proofs over doing correction usually because error correction requires downloading twice the error rate, making Merkle proofs cheaper. We're addressing a narrower threat model here though, so maybe error correction with BCH suffices, not sure.

We'll could just reconstruct from partial shards optimistically, check the partial reconstructed data's hash against another source, and then do a full reconstruction if this fails.

Appologies for rambling there..

drskalman · 2020-08-26T11:43:46Z

I'd rather opening a new issue for any error correcting related matter, and move the discussion about error correction there. But for the matter of partial recovery I understood that I misunderstood the requirement when I originally created the ticket. TL;DR is that not only we want to limit the reconstruction to a specific shard and not the whole data but also only specific sub-chunk of that shard and we only have the corresponding chunk from the other data and parity shards to construct it. So assuming for all shard bytes [0,..,n-1] and [n+m+1,shard_length] are missing we only have bytes [n,m] from more than k shards (k being construction threshold), perhaps because we only asked for that much data because of network bandwidth constrain. I'm going to edit the ticket title to reflect the actual requirement.

darrenldl · 2020-08-27T03:38:34Z

Okay, that's interesting.

Is the sub chunk always contiguous?

burdges · 2020-08-27T05:03:58Z

Yes. It's mostly an authentication question: How much do we want to make hashing and merkrle proofs overlap with the erasure coding? I suppose a shard type could expose an iterator over data in the same polynomial, which I guess is ever even index.

drskalman changed the title ~~Add the ability of only reconstructing certain shards~~ Add the ability of only reconstructing certain sub-chunk of certain shards Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability of only reconstructing certain sub-chunk of certain shards #79

Add the ability of only reconstructing certain sub-chunk of certain shards #79

drskalman commented Aug 25, 2020

darrenldl commented Aug 25, 2020

drskalman commented Aug 25, 2020 via email

burdges commented Aug 25, 2020

drskalman commented Aug 26, 2020 via email •

edited

Loading

darrenldl commented Aug 27, 2020

burdges commented Aug 27, 2020

Add the ability of only reconstructing certain sub-chunk of certain shards #79

Add the ability of only reconstructing certain sub-chunk of certain shards #79

Comments

drskalman commented Aug 25, 2020

darrenldl commented Aug 25, 2020

drskalman commented Aug 25, 2020 via email

burdges commented Aug 25, 2020

drskalman commented Aug 26, 2020 via email • edited Loading

darrenldl commented Aug 27, 2020

burdges commented Aug 27, 2020

drskalman commented Aug 26, 2020 via email •

edited

Loading