Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc](backup)Modify backup and restore documentation #1906

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,30 +28,64 @@ under the License.

## Description

This statement is used to back up the data under the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the SHOW BACKUP command. Only backing up tables of type OLAP is supported.
This statement is used to back up the data under the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the [SHOW BACKUP](./SHOW-BACKUP.md) command.

Only root or superuser users can create repositories.

grammar:
## Syntax

```sql
BACKUP SNAPSHOT [db_name].{snapshot_name}
TO `repository_name`
[ON|EXCLUDE] (
`table_name` [PARTITION (`p1`, ...)],
...
)
PROPERTIES ("key"="value", ...);
BACKUP SNAPSHOT <db_name>.<snapshot_name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BACKUP SNAPSHOT <db_name>.<snapshot_name>
BACKUP SNAPSHOT [<db_name>.]<snapshot_name>

TO `<repository_name>`
[ {ON|EXCLUDE]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ {ON|EXCLUDE]}
[ { ON | EXCLUDE } ]

( <table_name> [ PARTITION ( <partition_name> [, ...] ) ]
[, ...] ) ]
[ PROPERTIES ( "<key>" = "<value>" [ , ... ] )]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这最后,是多了一个方括号吗?

```

illustrate:
## Required Parameters

**1.`<db_name>`**

The name of the database to which the data to be backed up belongs.

**2.`<snapshot_name>`**

Specify the data snapshot name. The snapshot name cannot be repeated and is globally unique.

**3.`<repository_name>`**

Warehouse name. You can create a repository via [CREATE REPOSITORY](./CREATE-REPOSITORY.md).

## Optional Parameters

**1.`<table_name>`**

The name of the table to be backed up. If not specified, the entire database will be backed up.

- There can only be one executing BACKUP or RESTORE task under the same database.
- The ON clause identifies the tables and partitions that need to be backed up. If no partition is specified, all partitions of the table are backed up by default
- Tables and partitions that do not require backup are identified in the EXCLUDE clause. Back up all partition data for all tables in this database except the specified table or partition.
- PROPERTIES currently supports the following properties:
- "type" = "full": indicates that this is a full update (default)
- "timeout" = "3600": The task timeout period, the default is one day. in seconds.

**2.`<partition_name>`**

The name of the partition to be backed up. If not specified, all partitions of the corresponding table will be backed up.

**3.`[ PROPERTIES ( "<key>" = "<value>" [ , ... ] ) ]`**

Data snapshot attributes, in the format: `<key>` = `<value>`,currently supports the following properties:

- "type" = "full": indicates that this is a full update (default)
- "timeout" = "3600": The task timeout period, the default is one day. in seconds.

## Access Control Requirements

Only root or superuser users can create repositories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里用权限来描述,是拥有什么权限的用户才能创建?


## Usage Notes

- Only backing up tables of type OLAP is supported.
- Only one backup operation can be performed under the same database.
- The backup operation will back up the underlying table and [Synchronous materialized view](../../../../query/view-materialized-view/materialized-view.md) of the specified table or partition, and only one replica will be backed up. [Asynchronous materialized view](../../../../query/view-materialized-view/async-materialized-view.md) is not supported.
- Efficiency of backup operations:The efficiency of backup operations depends on the amount of data, the number of Compute Nodes, and the number of files. Each Compute Node where the backup data shard is located will participate in the upload phase of the backup operation. The greater the number of nodes, the higher the upload efficiency. The amount of file data refers only to the number of shards, and the number of files in each shard. If there are many shards, or there are many small files in the shards, the backup operation time may be increased.

## Example

Expand Down Expand Up @@ -90,19 +124,3 @@ EXCLUDE (example_tbl);
BACKUP SNAPSHOT example_db.snapshot_label3
TO example_repo;
```

## Keywords

BACKUP

## Best Practice

1. Only one backup operation can be performed under the same database.

2. The backup operation will back up the underlying table and [Synchronous materialized view](../../../../query/view-materialized-view/materialized-view.md) of the specified table or partition, and only one replica will be backed up. [Asynchronous materialized view](../../../../query/view-materialized-view/async-materialized-view.md) is not supported.

3. Efficiency of backup operations

The efficiency of backup operations depends on the amount of data, the number of Compute Nodes, and the number of files. Each Compute Node where the backup data shard is located will participate in the upload phase of the backup operation. The greater the number of nodes, the higher the upload efficiency.

The amount of file data refers only to the number of shards, and the number of files in each shard. If there are many shards, or there are many small files in the shards, the backup operation time may be increased.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "CANCEL BACKUP",
"language": "en"
"title": "CANCEL BACKUP",
"language": "en"
}
---

Expand All @@ -24,28 +24,26 @@ specific language governing permissions and limitations
under the License.
-->



## Description

This statement is used to cancel an ongoing BACKUP task.

grammar:
## Syntax

```sql
CANCEL BACKUP FROM db_name;
CANCEL BACKUP FROM <db_name>;
```

## Parameters

**1.`<db_name>`**

The name of the database to which the backup task belongs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个语句不需要权限嘛?

## Example

1. Cancel the BACKUP task under example_db.

```sql
CANCEL BACKUP FROM example_db;
```

## Keywords

CANCEL, BACKUP

## Best Practice
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,19 @@ under the License.

This statement is used to cancel an ongoing RESTORE task.

grammar:
## Syntax

```sql
CANCEL RESTORE FROM db_name;
CANCEL RESTORE FROM <db_name>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个语句不需要权限嘛?

```

Notice:
## Parameters

**1.`<db_name>`**

The name of the database to which the recovery task belongs.

## Usage Notes

- When cancellation is around a COMMIT or later stage of recovery, the table being recovered may be rendered inaccessible. At this time, data recovery can only be performed by executing the recovery job again.

Expand All @@ -48,8 +54,3 @@ Notice:
CANCEL RESTORE FROM example_db;
```

## Keywords

CANCEL, RESTORE

## Best Practice
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "RESTORE",
"language": "en"
"title": "RESTORE",
"language": "en"
}

---
Expand All @@ -25,44 +25,76 @@ specific language governing permissions and limitations
under the License.
-->




## Description

This statement is used to restore the data backed up by the BACKUP command to the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the SHOW RESTORE command. Restoring tables of type OLAP is only supported.
This statement is used to restore the data backed up by the BACKUP command to the specified database. This command is an asynchronous operation. After the submission is successful, you need to check the progress through the [SHOW RESTORE](./SHOW-RESTORE.md) command.

grammar:
## Syntax

```sql
RESTORE SNAPSHOT [db_name].{snapshot_name}
FROM `repository_name`
[ON|EXCLUDE] (
`table_name` [PARTITION (`p1`, ...)] [AS `tbl_alias`],
...
RESTORE SNAPSHOT <db_name>.<snapshot_name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RESTORE SNAPSHOT <db_name>.<snapshot_name>
RESTORE SNAPSHOT [<db_name>.]<snapshot_name>

FROM `<repository_name>`
[{ON|EXCLUDE]} (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[{ON|EXCLUDE]} (
[ { ON | EXCLUDE } ] (

`<table_name>` [PARTITION (`<partition_name>`, ...)] [AS `<table_alias>`]
[, ...] ) ]
)
PROPERTIES ("key"="value", ...);
[ PROPERTIES ( "<key>" = "<value>" [ , ... ] )]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后多了个方括号吗?

```

illustrate:
## Required Parameters

**1.`<db_name>`**

The name of the database to which the data to be restored belongs

**2.`<snapshot_name>`**

Data snapshot name

**3.`<repository_name>`**

Warehouse name. You can create a repository via [CREATE REPOSITORY](./CREATE-REPOSITORY.md)

**4.`[ PROPERTIES ( "<key>" = "<value>" [ , ... ] ) ]`**

Restoration operation attributes, the format is `<key>` = `<value>`,currently supports the following properties:

- "backup_timestamp" = "2018-05-04-16-45-08": Specifies which time version of the corresponding backup to restore, required. This information can be obtained with the `SHOW SNAPSHOT ON repo;` statement.
- "replication_num" = "3": Specifies the number of replicas for the restored table or partition. Default is 3. If restoring an existing table or partition, the number of replicas must be the same as the number of replicas of the existing table or partition. At the same time, there must be enough hosts to accommodate multiple replicas.
- "reserve_replica" = "true": Default is false. When this property is true, the replication_num property is ignored and the restored table or partition will have the same number of replication as before the backup. Supports multiple tables or multiple partitions within a table with different replication number.
- "reserve_dynamic_partition_enable" = "true": Default is false. When this property is true, the restored table will have the same value of 'dynamic_partition_enable' as before the backup. if this property is not true, the restored table will set 'dynamic_partition_enable=false'.
- "timeout" = "3600": The task timeout period, the default is one day. in seconds.
- "meta_version" = 40: Use the specified meta_version to read the previously backed up metadata. Note that this parameter is used as a temporary solution and is only used to restore the data backed up by the old version of Doris. The latest version of the backup data already contains the meta version, no need to specify it.
- "clean_tables" : Indicates whether to clean up tables that do not belong to the restore target. For example, if the target db before the restore has tables that are not present in the snapshot, specifying `clean_tables` can drop these extra tables and move them into the recycle bin during the restore.
- This feature is supported since the Apache Doris 1.2.6 version
- "clean_partitions": Indicates whether to clean up partitions that do not belong to the restore target. For example, if the target table before the restore has partitions that are not present in the snapshot, specifying `clean_partitions` can drop these extra partitions and move them into the recycle bin during the restore.
- This feature is supported since the Apache Doris 1.2.6 version

## Optional Parameters

**1.`<table_name>`**

The name of the table to be restored. If not specified, the entire database will be restored.

- There can only be one executing BACKUP or RESTORE task under the same database.
- The tables and partitions that need to be restored are identified in the ON clause. If no partition is specified, all partitions of the table are restored by default. The specified table and partition must already exist in the warehouse backup.
- Tables and partitions that do not require recovery are identified in the EXCLUDE clause. All partitions of all other tables in the warehouse except the specified table or partition will be restored.
- The table name backed up in the warehouse can be restored to a new table through the AS statement. But the new table name cannot already exist in the database. The partition name cannot be modified.

**2.`<partition_name>`**

The name of the partition to be restored. If not specified, all partitions of the corresponding table will be restored.

**3.`<table_alias>`**

table alias

## Usage Notes

- Restoring tables of type OLAP is only supported.
- There can only be one executing BACKUP or RESTORE task under the same database.
- You can restore the backed up tables in the warehouse to replace the existing tables of the same name in the database, but you must ensure that the table structures of the two tables are exactly the same. The table structure includes: table name, column, partition, Rollup, etc.
- You can specify some partitions of the recovery table, and the system will check whether the partition Range or List can match.
- PROPERTIES currently supports the following properties:
- "backup_timestamp" = "2018-05-04-16-45-08": Specifies which time version of the corresponding backup to restore, required. This information can be obtained with the `SHOW SNAPSHOT ON repo;` statement.
- "replication_num" = "3": Specifies the number of replicas for the restored table or partition. Default is 3. If restoring an existing table or partition, the number of replicas must be the same as the number of replicas of the existing table or partition. At the same time, there must be enough hosts to accommodate multiple replicas.
- "reserve_replica" = "true": Default is false. When this property is true, the replication_num property is ignored and the restored table or partition will have the same number of replication as before the backup. Supports multiple tables or multiple partitions within a table with different replication number.
- "reserve_dynamic_partition_enable" = "true": Default is false. When this property is true, the restored table will have the same value of 'dynamic_partition_enable' as before the backup. if this property is not true, the restored table will set 'dynamic_partition_enable=false'.
- "timeout" = "3600": The task timeout period, the default is one day. in seconds.
- "meta_version" = 40: Use the specified meta_version to read the previously backed up metadata. Note that this parameter is used as a temporary solution and is only used to restore the data backed up by the old version of Doris. The latest version of the backup data already contains the meta version, no need to specify it.
- "clean_tables" : Indicates whether to clean up tables that do not belong to the restore target. For example, if the target db before the restore has tables that are not present in the snapshot, specifying `clean_tables` can drop these extra tables and move them into the recycle bin during the restore.
- This feature is supported since the Apache Doris 1.2.6 version
- "clean_partitions": Indicates whether to clean up partitions that do not belong to the restore target. For example, if the target table before the restore has partitions that are not present in the snapshot, specifying `clean_partitions` can drop these extra partitions and move them into the recycle bin during the restore.
- This feature is supported since the Apache Doris 1.2.6 version
- The table name backed up in the warehouse can be restored to a new table through the AS statement. But the new table name cannot already exist in the database. The partition name cannot be modified.
- Efficiency of recovery operations:In the case of the same cluster size, the time-consuming of the restore operation is basically the same as the time-consuming of the backup operation. If you want to speed up the recovery operation, you can first restore only one copy by setting the `replication_num` parameter, and then adjust the number of copies by [ALTER TABLE PROPERTY](../../../../sql-manual/sql-statements/table-and-view/table/ALTER-TABLE-PROPERTY), complete the copy.

## Example

Expand Down Expand Up @@ -106,21 +138,3 @@ PROPERTIES
"backup_timestamp"="2018-05-04-18-12-18"
);
```

## Keywords

```
RESTORE
```

## Best Practice

1. There can only be one ongoing recovery operation under the same database.

2. The table backed up in the warehouse can be restored and replaced with the existing table of the same name in the database, but the table structure of the two tables must be completely consistent. The table structure includes: table name, columns, partitions, materialized views, and so on.

3. When specifying a partial partition of the recovery table, the system will check whether the partition range can match.

4. Efficiency of recovery operations:

In the case of the same cluster size, the time-consuming of the restore operation is basically the same as the time-consuming of the backup operation. If you want to speed up the recovery operation, you can first restore only one copy by setting the `replication_num` parameter, and then adjust the number of copies by [ALTER TABLE PROPERTY](../../../../sql-manual/sql-statements/table-and-view/table/ALTER-TABLE-PROPERTY), complete the copy.
Original file line number Diff line number Diff line change
Expand Up @@ -29,49 +29,47 @@ under the License.

This statement is used to view BACKUP tasks

grammar:
## Syntax

```sql
SHOW BACKUP [FROM db_name]
[WHERE SnapshotName ( LIKE | = ) 'snapshot name']
SHOW BACKUP [FROM <db_name>]
[WHERE SnapshotName ( LIKE | = ) '<snapshot_name>' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[WHERE SnapshotName ( LIKE | = ) '<snapshot_name>' ]
[WHERE SnapshotName { LIKE | = } '<snapshot_name>' ]

```

illustrate:

1. Only the most recent BACKUP task is saved in Doris.
2. The meaning of each column is as follows:
- `JobId`: Unique job id
- `SnapshotName`: The name of the backup
- `DbName`: belongs to the database
- `State`: current stage
- `PENDING`: The initial state after submitting the job
- `SNAPSHOTING`: Executing snapshot
- `UPLOAD_SNAPSHOT`: Snapshot completed, ready to upload
- `UPLOADING`: Snapshot uploading
- `SAVE_META`: Save job meta information to a local file
- `UPLOAD_INFO`: Upload job meta information
- `FINISHED`: The job was successful
- `CANCELLED`: Job failed
- `BackupObjs`: Backed up tables and partitions
- `CreateTime`: task submission time
- `SnapshotFinishedTime`: Snapshot completion time
- `UploadFinishedTime`: Snapshot upload completion time
- `FinishedTime`: Job finish time
- `UnfinishedTasks`: Displays unfinished subtask ids during SNAPSHOTING and UPLOADING stages
- `Status`: If the job fails, display the failure message
- `Timeout`: Job timeout, in seconds
## Parameters

## Example
**1.`<db_name>`**

1. View the last BACKUP task under example_db.
The name of the database to which the backup task belongs.

**2.`<snapshot_name>`**

```sql
SHOW BACKUP FROM example_db;
```
Backup name.

## Keywords
## Return Value

SHOW, BACKUP
| Column | Description |
| -- | -- |
| JobId | Unique job id |
| SnapshotName | The name of the backup |
| DbName | belongs to the database |
| State | current stage:<ul><li>PENDING: The initial state after submitting the job.</li><li>SNAPSHOTING: Executing snapshot.</li><li>UPLOAD_SNAPSHOT: Snapshot completed, ready to upload.</li><li>UPLOADING: Snapshot uploading.</li><li>SAVE_META: Save job meta information to a local file.</li><li>UPLOAD_INFO: Upload job meta information.</li><li>FINISHED: The job was successful.</li><li>CANCELLED: Job failed.</li></ul> |
| BackupObjs | Backed up tables and partitions |
| CreateTime | task submission time |
| SnapshotFinishedTime | Snapshot completion time |
| UploadFinishedTime | Snapshot upload completion time |
| FinishedTime | Job finish time |
| UnfinishedTasks | Displays unfinished subtask ids during SNAPSHOTING and UPLOADING stages |
| Progress | Task progress |
| TaskErrMsg | Display task error messages |
| Status | If the job fails, display the failure message |
| Timeout | Job timeout, in seconds |

## Example

## Best Practice
1. View the last BACKUP task under example_db.

```sql
SHOW BACKUP FROM example_db;
```

Loading