Skip to content

Commit

Permalink
doc: add migration guide to readme
Browse files Browse the repository at this point in the history
  • Loading branch information
nlfiedler committed Jan 25, 2023
1 parent d874c81 commit 61d2a0d
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 6 deletions.
33 changes: 28 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ which demonstrate reading files of arbitrary size into a memory-mapped buffer
and passing them through the different chunker implementations.

```shell
$ cargo run --example v2016 -- --size 16384 test/fixtures/SekienAkashita.jpg
$ cargo run --example v2020 -- --size 16384 test/fixtures/SekienAkashita.jpg
Finished dev [unoptimized + debuginfo] target(s) in 0.03s
Running `target/debug/examples/v2016 --size 16384 test/fixtures/SekienAkashita.jpg`
Running `target/debug/examples/v2020 --size 16384 test/fixtures/SekienAkashita.jpg`
hash=17968276318003433923 offset=0 size=21325
hash=4098594969649699419 offset=21325 size=17140
hash=15733367461443853673 offset=38465 size=28084
Expand All @@ -38,7 +38,7 @@ code snippet is an example:
let read_result = fs::read("test/fixtures/SekienAkashita.jpg");
assert!(read_result.is_ok());
let contents = read_result.unwrap();
let chunker = fastcdc::v2016::FastCDC::new(&contents, 16384, 32768, 65536);
let chunker = fastcdc::v2020::FastCDC::new(&contents, 16384, 32768, 65536);
let results: Vec<Chunk> = chunker.collect();
assert_eq!(results.len(), 2);
assert_eq!(results[0].offset, 0);
Expand All @@ -47,21 +47,44 @@ assert_eq!(results[1].offset, 66549);
assert_eq!(results[1].length, 42917);
```

## Migration from pre-3.0

If you were using a release of this crate from before the 3.0 release, you will need to make a small adjustment to continue using the same implemetation as before.

Before the 3.0 release:

```rust
use fastcdc::ronomon as fastcdc;
use std::fs;
let contents = fs::read("test/fixtures/SekienAkashita.jpg").unwrap();
let chunker = fastcdc::FastCDC::new(&contents, 8192, 16384, 32768);
```

After the 3.0 release:

```rust
use std::fs;
let contents = fs::read("test/fixtures/SekienAkashita.jpg").unwrap();
let chunker = fastcdc::ronomon::FastCDC::new(&contents, 8192, 16384, 32768);
```

The cut points produced will be identical to previous releases as the `ronomon` implementation was never changed in that manner. Note, however, that the other implementations _will_ produce different results.

## Reference Material

The original algorithm is described in [FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication](https://www.usenix.org/system/files/conference/atc16/atc16-paper-xia.pdf), while the improved "rolling two bytes each time" version is detailed in [The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems](https://ieeexplore.ieee.org/document/9055082).

## Other Implementations

* [jrobhoward/quickcdc](https://github.com/jrobhoward/quickcdc)
+ Similar but slightly earlier algorithm by some of the same researchers?
+ Similar but slightly earlier algorithm by some of the same authors?
* [rdedup_cdc at docs.rs](https://docs.rs/crate/rdedup-cdc/0.1.0/source/src/fastcdc.rs)
+ Alternative implementation in Rust.
* [ronomon/deduplication](https://github.com/ronomon/deduplication)
+ C++ and JavaScript implementation of a variation of FastCDC.
* [titusz/fastcdc-py](https://github.com/titusz/fastcdc-py)
+ Pure Python port of FastCDC. Compatible with this implementation.
* [wxiacode/destor](https://github.com/wxiacode/destor/blob/master/src/chunking)
* [wxiacode/FastCDC-c](https://github.com/wxiacode/FastCDC-c)
+ Canonical algorithm in C with gear table generation and mask values.
* [wxiacode/restic-FastCDC](https://github.com/wxiacode/restic-FastCDC)
+ Alternative implementation in Go with additional mask values.
1 change: 0 additions & 1 deletion TODO.org
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
* Action Items
** TODO Rewrite
*** TODO doc: add migration guide to =README.md=
*** TODO incorporate some form of streaming support based on =Read=
**** c.f. https://gitlab.com/asuran-rs/asuran/ (asuran-chunker, uses =fastcdc= with =Read=)
**** basically just allocate a buffer 2*max and fill it as needed
Expand Down

0 comments on commit 61d2a0d

Please sign in to comment.