Skip to content

Commit

Permalink
docs: client-side checksumming; main readme
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Aizman <[email protected]>
  • Loading branch information
alex-aizman committed Jan 29, 2025
1 parent 1fd95e0 commit 5fbbb70
Show file tree
Hide file tree
Showing 5 changed files with 100 additions and 22 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,13 @@ For ease of use, management, and monitoring, there's also:
```console
$ ais <TAB-TAB>

advanced cluster etl ls rmb stop
alias config evict object rmo storage
archive cp get performance scrub tls
auth create help prefetch show wait
blob-download download job put space-cleanup search
bucket dsort log remote-cluster start
advanced config get prefetch show
alias cp help put space-cleanup
archive create job remote-cluster start
auth download log rmb stop
blob-download dsort ls rmo storage
bucket etl object scrub tls
cluster evict performance search wait
```

AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux `tar(1)`, `scp(1)`, `rsync(1)` and similar.
Expand Down
8 changes: 4 additions & 4 deletions cmd/cli/cli/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -332,13 +332,13 @@ func (a *acli) setupCommands(emptyCmdline bool) {

app.Commands = append(app.Commands, a.initAliases()...)

// alphabetically
setupCommandHelp(app.Commands)
a.enableSearch()

// finally, alphabetically
sort.Slice(app.Commands, func(i, j int) bool {
return app.Commands[i].Name < app.Commands[j].Name
})

setupCommandHelp(app.Commands)
a.enableSearch()
}

func (a *acli) enableSearch() {
Expand Down
13 changes: 7 additions & 6 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,12 +115,13 @@ The recommended and, actually, fastest way to get started with CLI is to type `a
```console
$ ais <TAB-TAB>

advanced cluster etl ls rmb stop
alias config evict object rmo storage
archive cp get performance scrub tls
auth create help prefetch show wait
blob-download download job put space-cleanup search
bucket dsort log remote-cluster start
advanced config get prefetch show
alias cp help put space-cleanup
archive create job remote-cluster start
auth download log rmb stop
blob-download dsort ls rmo storage
bucket etl object scrub tls
cluster evict performance search wait
```

These are the current set of top-level commands. Each command has its own extended help (the `--help` option) and, usually, multiple sub-commands
Expand Down
75 changes: 75 additions & 0 deletions docs/cli/object.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ ls promote concat evict mv cat
- [Out of band updates](/docs/out_of_band.md)
- [PUT object](#put-object)
- [Object names](#object-names)
- [Put with client-side checksumming](#put-with-client-side-checksumming)
- [Put single file](#put-single-file)
- [Put single file with checksum](#put-single-file-with-checksum)
- [Put single file with implicitly defined name](#put-single-file-with-implicitly-defined-name)
Expand Down Expand Up @@ -597,6 +598,80 @@ All **examples** below put into an empty bucket and the source directory structu

The current user HOME directory is `/home/user`.

## Put with client-side checksumming

> **Motivation**: There's always a motivation to perform faster. One way to achieve this is by avoiding redundant writes of user data. A write operation can effectively become a no-op if the identical data already exists in the cluster. The conventional method to establish such identity is through content checksumming.
In short, here's a CLI write-optimizing trick that utilizes client-side checksumming.

### First PUT:

```console
$ time ais put /tmp/www s3://ais-aa --yes --compute-checksum --recursive

Files to upload:
EXTENSION COUNT SIZE
27 562.90MiB
.go 1 123B
.prev 1 7.62MiB
.txt 2 10.87KiB
TOTAL 31 570.53MiB
Uploaded 4(12%) objects, 9.6MiB (1%).
Uploaded 8(25%) objects, 55.3MiB (9%).
Uploaded 13(41%) objects, 105.8MiB (18%).
Uploaded 23(74%) objects, 315.7MiB (55%).
Uploaded 29(93%) objects, 449.3MiB (78%).
Uploaded 31(100%) objects, 570.5MiB (100%).

PUT 31 files (one directory, recursively) => s3://ais-aa

real 0m44.895s <<<<<<<<<<<<<<<<<<<<<< 45s
user 0m0.097s
sys 0m0.355s
```

### Second PUT with no changes at the source:

```console
$ time ais put /tmp/www s3://ais-aa --yes --compute-checksum --recursive

Files to upload:
EXTENSION COUNT SIZE
27 562.90MiB
.go 1 123B
.prev 1 7.62MiB
.txt 2 10.87KiB
TOTAL 31 570.53MiB

PUT 31 files (one directory, recursively) => s3://ais-aa

real 0m0.136s <<<<<<<<<<<<<<<<<<<<<<<<<<<< (PUT took no time)
user 0m0.107s
sys 0m0.509s
```

### Adding one file to the source:

```console
$ time ais put /tmp/www s3://ais-aa --yes --compute-checksum --recursive

Files to upload:
EXTENSION COUNT SIZE
28 563.17MiB
.go 1 123B
.prev 1 7.62MiB
.txt 2 10.87KiB
TOTAL 32 570.80MiB

PUT 32 files (one directory, recursively) => s3://ais-aa

real 0m1.029s <<<<<<<<<<<<<<<<<< 1s
user 0m0.121s
sys 0m0.588s
```

> **Note:** Ideally, the checksum is provided with PUT API calls. The CLI takes it one step further: if client-side checksumming is requested but the checksum is empty, the CLI computes it automatically. The corresponding overhead must be taken into account when analyzing resulting performance.
## Put single file

First, compare two simple examples:
Expand Down
13 changes: 7 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,13 @@ For ease of use, management, and monitoring, there's also:
```console
$ ais <TAB-TAB>

advanced cluster etl ls rmb stop
alias config evict object rmo storage
archive cp get performance scrub tls
auth create help prefetch show wait
blob-download download job put space-cleanup search
bucket dsort log remote-cluster start
advanced config get prefetch show
alias cp help put space-cleanup
archive create job remote-cluster start
auth download log rmb stop
blob-download dsort ls rmo storage
bucket etl object scrub tls
cluster evict performance search wait
```

AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux `tar(1)`, `scp(1)`, `rsync(1)` and similar.
Expand Down

0 comments on commit 5fbbb70

Please sign in to comment.