Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
MuggleWei committed Jan 10, 2024
1 parent acc360e commit feae9f0
Show file tree
Hide file tree
Showing 2 changed files with 84 additions and 4 deletions.
78 changes: 77 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,78 @@
* [readme 中文](./README_cn.md)
* [readme EN](./README.md)
* [readme EN](./README.md)

## Overview
bdchecker (**B**ackup **D**ata Checker) is a tool for checking personal cold backup data, helping you discover data corruption in time

## Why use it
Imagine that we have some data that needs to be cold backup, which may be the raw market data of a certain financial market that is compressed every day; or some electronic versions of classic movies owned by individuals; or some keys that are not used all year round; Let’s first list some options:

| Storage plan | Storage life span |
| ---- | ---- |
| SSD | several years to more than ten years |
| HDD | 10+ years |
| tape drive | 30+ years |
| punched paper | thousand of years |
| Carved in stone (Luo Ji raised his crutch above his head and shouted solemnly) | millions of years |

There is no doubt that if you have enough financial resources to engrave the information on stone and store it properly, it should be very safe unless you are attacked by a dual-vector foil attack; but for individuals, the cost of reading information from stones should be far greater than the value of the data we need to save.
So when considering the ease of reading and writing of data, there is no doubt that the hard disk is the most convenient; but this brings additional requirements, that is, we need to regularly check whether the data is corrupted, this is the reason why use **bdchecker**

## Install
* use pip
```
pip install bdchecker
```
* download from project's [Releases](https://github.com/MuggleWei/bdchecker/releases), and decompress

## Usage
**bdchecker** include 3 sub-command
* gen: scan directory, recursively traverse to generate the hash information of all **new** files in the directory, and place them in the `.bdchecker.meta` folder.
* clean: scan directory, clean deleted files from hash information
* check: scan directory, Find corrupted files (note that this operation will calculate the hash value of all files, which is more time-consuming)

### Example directory
Assume that we currently have the following directory structure
```
data
├──── a.txt
├──── b.txt
└──── c
├──── c1.txt
└──── c2.txt
```

### Command: gen
Generate hash infos
```
bdchecker gen -d data -v 1
```
* `-d`: directory for which information needs to be generated
* `-v`: verbose level

After missiong completed, you can see console output: `dump meta info to data/.bdchecker.meta/sha256.csv`
When there are no new files in the directory, repeatedly executing the `gen` command will not actually generate the hash information of the file.

### Command: clean
remove `data/c/c2.txt`, then run
```
bdchecker clean -d data -v 1
```
You can see in the last few lines of the log: `clean missing file's meta info: c/c2.txt`, which means that we have successfully cleaned the hash information corresponding to the file.

### Command: check
run
```
bdchecker check -d data -v 1
```
The last line of the log appears: `all check pass`, which means there are no new/deleted files and all files are not corrupted.

Now, let's modify `a.text`, write something randomly, and then run again
```
bdchecker check -d data -v 1
```
At this time, an error message appears in the log: `check failed: a.txt, old hash: ..., cur hash: ...`, indicating that the content of `a.txt` has changed.

### Migration and comparison
The hash information generated by `bdchecker` will be saved in the `.bdchecker.meta` in the directory, so you can directly migrate the entire folder during migration.
When there are already multiple backup data and no hash value has been generated; at this time, you can use the `bdchecker gen` command to generate a hash value for each backup data, and then compare the two files. Since the generated file lines are already sorted, so you can directly use commands such as `diff` for comparison.
10 changes: 7 additions & 3 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ bdchecker (**B**ackup **D**ata Checker) 是用于个人冷备数据检查的工
| 打孔纸带 | 千年 |
| 刻在石头上 (罗辑把拐杖高举过头, 庄严地喊道) | 百万年 |

毫无疑问, 若是有足够的财力, 把信息刻在石头上并妥当存储, 除非遭到了二向箔攻击, 否则应该十分安全;但是对于个人而言, 从石头上读取信息带来的成本应该是远大于我们需要保存的数据的价值的
毫无疑问, 若是有足够的财力, 把信息刻在石头上并妥当存储, 除非遭到了二向箔攻击, 否则应该十分安全;但是对于个人而言, 从石头上读取信息带来的成本应该是远大于我们需要保存的数据的价值
所以当考虑数据的易于读写性的时候, 那么毫无疑问, 硬盘是最为方便的;但是这带来了额外的要求, 那便是我们需要定期的检查数据是否出现了损坏, 这便可以通过 **bdchecker** 来实现

## 安装
Expand All @@ -31,7 +31,7 @@ pip install bdchecker
**bdchecker** 包含三个命令, 分别为
* gen: 扫描目录, 并递归遍历生成该目录下所有**新增**文件的 hash 信息, 放置在目录中的 `.bdchecker.meta` 文件夹中
* clean: 扫描目录, 从 hash 信息中清理掉已删除的文件
* check: 扫描目录, 查找出现损坏的文件 (注意, 此操作会计算所有文件的 hash 值, 每次都会较为耗费时间)
* check: 扫描目录, 查找出现损坏的文件 (注意, 此操作会计算所有文件的 hash 值, 较为耗费时间)

### 示例目录
假设当前有如下目录结构
Expand Down Expand Up @@ -74,4 +74,8 @@ bdchecker check -d data -v 1
```
bdchecker check -d data -v 1
```
此时,日志出现错误信息: `check failed: a.txt, old hash: ..., cur hash: ...`,表示 `a.txt` 的内容出现了改变
此时,日志出现错误信息: `check failed: a.txt, old hash: ..., cur hash: ...`,表示 `a.txt` 的内容出现了改变

## 迁移与对比
`bdchecker` 生成的 hash 信息会保存在目录中的 `.bdchecker.meta` 目录中, 所以迁移时直接整个文件夹迁移即可
当已经有多份冷备数据存在, 且并没有生成过 hash 值时; 此时可以对每份冷备数据都使用 `bdchecker gen` 命令来生成 hash 值, 接着对比两份文件即可. 由于生成文件行是已经排序的, 所以可以直接使用 `diff` 之类的命令进行对比

0 comments on commit feae9f0

Please sign in to comment.