Skip to content

Latest commit

 

History

History
120 lines (86 loc) · 4.15 KB

README.org

File metadata and controls

120 lines (86 loc) · 4.15 KB

Chunkyard

Chunkyard is a fast backup application for Windows and Linux which stores files in a content addressable storage with support for dynamic chunking and encryption.

The FastCDC chunking algorithm is a C# port of the Rust crate fastcdc-rs.

I am building Chunkyard for myself. You might want to consider more sophisticated tools. Here’s a list of options.

You can install Chunkyard by:

  • Downloading a release (no dependencies required)
  • Installing it as a dotnet tool

Goals

  • Cross platform support. Chunkyard is shipped as two binaries chunkyard (Linux) and chunkyard.exe (Windows) and they work without having to install .NET on your computer
  • Favor simplicity and readability over features and elaborate performance tricks
  • Strong symmetric encryption (AES Galois/Counter Mode using a 256 bit key)
  • Ability to copy from/to other repositories
  • Verifiable backups
  • No third-party dependencies

Non-Goals

  • Key management
  • Asymmetric encryption
  • Compression
  • Extended file system features such as OS specific flags or links
  • Extended “version control” features such as branching or tagging
  • Hiding chunk sizes to prevent CDC fingerprint attacks
  • Concurrent operations on a single repository using more than one Chunkyard process (e.g. creating a new backup while garbage collecting unused data)

Build

  • Install the .NET SDK
  • Run dotnet build src and dotnet test src to build and test Chunkyard

Publish

  • Optional: Create an annotated tag to define a new version number
  • Run ./publish to create Linux and Windows binaries in the artifacts directory

Usage

Type chunkyard help to see a list of all available commands. You can add --help to any command to get more information about what parameters it expects.

Example:

# List all available commands
chunkyard

# Learn more about the store command
chunkyard store --help

# See which files chunkyard would backup
chunkyard store --repository 'MyBackup' --path 'Music' 'Pictures' 'Videos' --dry-run

# Store a backup
chunkyard store --repository 'MyBackup' --path 'Music' 'Pictures' 'Videos'

# Check if the latest backup is valid
chunkyard check --repository 'MyBackup'

# Restore parts of the latest backup
chunkyard restore --repository 'MyBackup' --directory '.' --include 'mp3$'

# Keep the latest four backups
chunkyard keep --repository 'MyBackup' --latest '4'

You can find examples of how I use Chunkyard in my dotfiles.

Architecture

Concepts

  • Blob: Binary data (e.g. the content of a file) with some meta data
  • Snapshot: A set of BlobReferences. It describes the current state of a set of Blobs at a specific point in time
  • Repository: A store which Chunkyard uses to persist data
  • Chunk: An encrypted piece of a Blob or a Snapshot
  • Chunk ID: A hash address which can be used to retrieve Chunks
  • BlobReference: Contains Chunk IDs and meta data which can be used to restore a Blob
  • SnapshotReference: Contains Chunk IDs and meta data which can be used to restore a Snapshot

Main Components

These classes contain the most important logic:

Backup Workflow

  • Take a set of files
  • Split files into encrypted chunks, store them in a repository and return a list of BlobReferences
  • Bundle all BlobReferences into a Snapshot, store this Snapshot as encrypted chunks and return a SnapshotReference

Restore Workflow

  • Retrieve a Snapshot using a SnapshotReference
  • Retrieve, decrypt and reassemble all files using their BlobReferences of the given Snapshot