Skip to content

Commit

Permalink
Region migration user guide (#457)
Browse files Browse the repository at this point in the history
* zh master

* zh latest

* en master

* en latest

* sidebar

* timecho sidebar

* relative link

* relative link
  • Loading branch information
liyuheng55555 authored Dec 10, 2024
1 parent 75324c7 commit 5987d09
Show file tree
Hide file tree
Showing 8 changed files with 296 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/.vuepress/sidebar/V1.3.3/en.ts
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ export const enSidebar = {
{ text: 'Stream Processing', link: 'Streaming_apache' },
],
},
{ text: 'Maintenance SQL', link: 'Maintennance' },
],
},
{
Expand Down
1 change: 1 addition & 0 deletions src/.vuepress/sidebar/V1.3.3/zh.ts
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ export const zhSidebar = {
{ text: '流处理框架', link: 'Streaming_apache' },
],
},
{ text: '运维语句', link: 'Maintennance'},
],
},
{
Expand Down
1 change: 1 addition & 0 deletions src/.vuepress/sidebar_timecho/V1.3.3/en.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ export const enSidebar = {
{ text: 'Stream Processing', link: 'Streaming_timecho' },
],
},
{ text: 'Maintenance SQL', link: 'Maintennance' },
],
},
{
Expand Down
1 change: 1 addition & 0 deletions src/.vuepress/sidebar_timecho/V1.3.3/zh.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ export const zhSidebar = {
{ text: '流处理框架', link: 'Streaming_timecho' },
],
},
{ text: '运维语句', link: 'Maintennance'},
],
},
{
Expand Down
69 changes: 69 additions & 0 deletions src/UserGuide/Master/Tree/User-Manual/Maintennance.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,75 @@ Observing the results, we found that it is because the query did not add a time

The final optimization plan is: Add a time filtering condition to avoid a full table scan.

## Region Migration

The Region migration feature has operating costs. It is recommended to read this section completely before using this feature. If you have any questions about the solution design, please contact the IoTDB team for technical support.

### Feature introduction

IoTDB is a distributed database.
The balanced distribution of data plays an important role in the load balance of disk space
and write pressure in the cluster.
Region is the basic unit for distributed storage of data in IoTDB cluster.
Specific concept can be seen in [region](../../Background-knowledge/Cluster-Concept.md).

In cluster normal operation, IoTDB kernel will automatically load balance the data, but in the cluster new join DataNode node, DataNode where the machine hard disk damage needs to recover data and other scenarios, will involve the region of manual migration, in order to achieve more fine adjustment of the cluster load goal.

Here is a schematic diagram of the region migration process :

![image-20241209174037655.png](https://alioss.timecho.com/docs/img/image-20241209174037655.png)

### Notes

1. Region migration is supported only 1.3.3 IoTDB and later.
2. IoTConsensus and Ratis protocols ( `schema_region_consensus_protocol_class `and `data_region_consensus_protocol_class `in `iotdb-system.properties `) are currently supported .
3. Region migration requires system resources such as hard disk and internet bandwidth, and although the process does not block reads and writes , it may affect read and write speeds .
4. The region migration process will block the deletion of the WAL file of this consensus group. If the total number of WAL files reaches `wal_file_size_threshold_in_byte `, it will block writing.

### Instructions for use

- **Grammar definition** :
```SQL
migrateRegion
: MIGRATE REGION regionId=INTEGER_LITERAL FROM fromId=INTEGER_LITERAL TO toId=INTEGER_LITERAL
;
```
- **Meaning** : Migrates a region from one DataNode to another.
- **Example** : Migrating region 1 from DataNode 2 to DataNode 3:
```SQL
IoTDB> migrate region 1 from 2 to 3
Msg: The statement is executed successfully.
```

"The statement is executed successfully" only represents the successful submission of the region migration task, not the completion of execution. The execution status of the task can be viewed through the CLI command `show regions `.
- **Related configuration** :
- Migration speed control : modify `iotdb-system.properties `parameters `region_migration_speed_limit_bytes_per_second `control region migration speed.
- **Time cost estimation** :
- If there are no concurrent writes during the migration process, the time consumption can be simply estimated by dividing the region data volume by the data transfer speed. For example, for a 1TB region, the hard disk internet bandwidth and speed limit parameters jointly determine that the data transfer speed is 100MB/s, so it takes about 3 hours to complete the migration.
- If there are concurrent writes in the migration process, the time consumption will increase, and the specific time consumption depends on various factors such as write pressure and system resources. It can be simply estimated as `no concurrent write time × 1.5 `.
- **Migration progress observation** : During the migration process, the state changes can be observed through the CLI command `show regions `. Taking the 2 replicas as an example, the state of the consensus group where the region is located will go through the following process:
- Before migration starts: `Running `, `Running `.
- Expansion phase: `Running `, `Running `, `Adding `. Due to the large number of file transfers involved, it may take a long time. If using IoTConsensus, the specific file transfer progress can be searched in the DataNode log `[SNAPSHOT TRANSMISSION] `.
- Stages: `Removing `, `Running `, `Running `.
- Migration complete: `Running `, `Running `.

Taking the expansion phase as an example, the result of `show regions `may be:

```Plain
IoTDB> show regions
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort|InternalAddress| Role| CreateTime|
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
| 0|SchemaRegion|Running| root.ln| 1| 0| 1| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 0|SchemaRegion|Running| root.ln| 1| 0| 2| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 0|SchemaRegion|Running| root.ln| 1| 0| 3| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 1| DataRegion|Running| root.ln| 1| 1| 1| 0.0.0.0| 6667| 127.0.0.1| Leader|2024-04-15T18:55:19.457|
| 1| DataRegion|Running| root.ln| 1| 1| 2| 0.0.0.0| 6668| 127.0.0.1|Follower|2024-04-15T18:55:19.457|
| 1| DataRegion| Adding| root.ln| 1| 1| 3| 0.0.0.0| 6668| 127.0.0.1|Follower|2024-04-15T18:55:19.457|
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
Total line number = 3
It costs 0.003s
```

## Start/Stop Repair Data Statements
Used to repair the unsorted data generate by system bug.
Expand Down
69 changes: 69 additions & 0 deletions src/UserGuide/latest/User-Manual/Maintennance.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,75 @@ Observing the results, we found that it is because the query did not add a time

The final optimization plan is: Add a time filtering condition to avoid a full table scan.

## Region Migration

The Region migration feature has operating costs. It is recommended to read this section completely before using this feature. If you have any questions about the solution design, please contact the IoTDB team for technical support.

### Feature introduction

IoTDB is a distributed database.
The balanced distribution of data plays an important role in the load balance of disk space
and write pressure in the cluster.
Region is the basic unit for distributed storage of data in IoTDB cluster.
Specific concept can be seen in [region](../../Background-knowledge/Cluster-Concept.md).

In cluster normal operation, IoTDB kernel will automatically load balance the data, but in the cluster new join DataNode node, DataNode where the machine hard disk damage needs to recover data and other scenarios, will involve the region of manual migration, in order to achieve more fine adjustment of the cluster load goal.

Here is a schematic diagram of the region migration process :

![image-20241209174037655.png](https://alioss.timecho.com/docs/img/image-20241209174037655.png)

### Notes

1. Region migration is supported only 1.3.3 IoTDB and later.
2. IoTConsensus and Ratis protocols ( `schema_region_consensus_protocol_class `and `data_region_consensus_protocol_class `in `iotdb-system.properties `) are currently supported .
3. Region migration requires system resources such as hard disk and internet bandwidth, and although the process does not block reads and writes , it may affect read and write speeds .
4. The region migration process will block the deletion of the WAL file of this consensus group. If the total number of WAL files reaches `wal_file_size_threshold_in_byte `, it will block writing.

### Instructions for use

- **Grammar definition** :
```SQL
migrateRegion
: MIGRATE REGION regionId=INTEGER_LITERAL FROM fromId=INTEGER_LITERAL TO toId=INTEGER_LITERAL
;
```
- **Meaning** : Migrates a region from one DataNode to another.
- **Example** : Migrating region 1 from DataNode 2 to DataNode 3:
```SQL
IoTDB> migrate region 1 from 2 to 3
Msg: The statement is executed successfully.
```

"The statement is executed successfully" only represents the successful submission of the region migration task, not the completion of execution. The execution status of the task can be viewed through the CLI command `show regions `.
- **Related configuration** :
- Migration speed control : modify `iotdb-system.properties `parameters `region_migration_speed_limit_bytes_per_second `control region migration speed.
- **Time cost estimation** :
- If there are no concurrent writes during the migration process, the time consumption can be simply estimated by dividing the region data volume by the data transfer speed. For example, for a 1TB region, the hard disk internet bandwidth and speed limit parameters jointly determine that the data transfer speed is 100MB/s, so it takes about 3 hours to complete the migration.
- If there are concurrent writes in the migration process, the time consumption will increase, and the specific time consumption depends on various factors such as write pressure and system resources. It can be simply estimated as `no concurrent write time × 1.5 `.
- **Migration progress observation** : During the migration process, the state changes can be observed through the CLI command `show regions `. Taking the 2 replicas as an example, the state of the consensus group where the region is located will go through the following process:
- Before migration starts: `Running `, `Running `.
- Expansion phase: `Running `, `Running `, `Adding `. Due to the large number of file transfers involved, it may take a long time. If using IoTConsensus, the specific file transfer progress can be searched in the DataNode log `[SNAPSHOT TRANSMISSION] `.
- Stages: `Removing `, `Running `, `Running `.
- Migration complete: `Running `, `Running `.

Taking the expansion phase as an example, the result of `show regions `may be:

```Plain
IoTDB> show regions
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort|InternalAddress| Role| CreateTime|
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
| 0|SchemaRegion|Running| root.ln| 1| 0| 1| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 0|SchemaRegion|Running| root.ln| 1| 0| 2| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 0|SchemaRegion|Running| root.ln| 1| 0| 3| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 1| DataRegion|Running| root.ln| 1| 1| 1| 0.0.0.0| 6667| 127.0.0.1| Leader|2024-04-15T18:55:19.457|
| 1| DataRegion|Running| root.ln| 1| 1| 2| 0.0.0.0| 6668| 127.0.0.1|Follower|2024-04-15T18:55:19.457|
| 1| DataRegion| Adding| root.ln| 1| 1| 3| 0.0.0.0| 6668| 127.0.0.1|Follower|2024-04-15T18:55:19.457|
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
Total line number = 3
It costs 0.003s
```

## Start/Stop Repair Data Statements
Used to repair the unsorted data generate by system bug.
Expand Down
77 changes: 77 additions & 0 deletions src/zh/UserGuide/Master/Tree/User-Manual/Maintennance.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,83 @@ select count(s1) as total from root.db.d1 where s1 like '%XXXXXXXX%'

最终优化方案为:增加时间过滤条件,避免全表扫描

## region 迁移

region 迁移功能具有一定操作成本,建议完整阅读本节后再使用 region 迁移功能。如对方案设计有疑问请联系 IoTDB 团队寻求技术支持。

### 功能介绍

IoTDB 是一个分布式数据库,数据的均衡分布对集群的磁盘空间、写入压力的负载均衡有着重要作用,region 是数据在 IoTDB 集群中进行分布式存储的基本单元,
具体概念可见[region](../../Background-knowledge/Cluster-Concept.md)

集群正常运行情况下,IoTDB 内核将会自动对数据进行负载均衡,但在集群新加入 DataNode 节点、DataNode 所在机器硬盘损坏需要恢复数据等场景下,会涉及到对 region 的手动迁移,以达到更精细化的调整集群负载的目标。

下面是一次region迁移过程的示意图:


![image-20241209174037655.png](https://alioss.timecho.com/docs/img/image-20241209174037655.png)

### 注意事项

1. 仅支持在 IoTDB 1.3.3 以及更高版本使用 region 迁移功能。
2. 目前支持 IoTConsensus、Ratis 协议。见 iotdb-system.properties 中的schema_region_consensus_protocol_class 和 data_region_consensus_protocol_class。
3. region 迁移需要占用硬盘和网络带宽等系统资源,尽管过程不阻塞读取和写入,但可能影响读写速度。
4. region 迁移过程会阻塞本共识组 WAL 文件的删除,如果 WAL 文件总量达到`wal_file_size_threshold_in_byte`,则会阻塞写入。

### 使用说明

- **语法定义**
```SQL
migrateRegion
: MIGRATE REGION regionId=INTEGER_LITERAL FROM fromId=INTEGER_LITERAL TO toId=INTEGER_LITERAL
;
```

- **含义**:将 region 从一个 DataNode 迁移到另一个 DataNode。

- **示例**:将 region 1 从 DataNode 2 迁移至 DataNode 3
```SQL
IoTDB> migrate region 1 from 2 to 3
Msg: The statement is executed successfully.
```

“The statement is executed successfully” 仅代表region迁移任务提交成功,不代表执行完毕。任务执行情况通过 CLI 指令`show regions`查看。

- **相关配置项**

- 迁移速度控制:修改`iotdb-system.properties`参数 `region_migration_speed_limit_bytes_per_second`控制 region 迁移速度。

- **耗时估算**
- 如果迁移过程无并发写入,那么耗时可以简单通过 region 数据量除以数据传输速度来估算。例如对于 1TB 的 region,硬盘网络带宽和限速参数共同决定数据传输速度是 100MB/s,那么需要约 3 小时完成迁移。
- 如果迁移过程有并发写入,那么耗时会有所上升,具体耗时取决于写入压力、系统资源等多方面因素,可简单按`无并发写入耗时×1.5`来估算。

- **迁移进度观察**:迁移过程中可通过 CLI 指令`show regions`观察状态变化,以 2 副本为例,region 所在共识组的状态会经历如下过程:
- 迁移开始前:`Running``Running`

- 扩容阶段:`Running``Running``Adding`。由于涉及到大量文件传输,可能耗时较长,具体进度在 DataNode 日志中搜索`[SNAPSHOT TRANSMISSION]`

- 缩容阶段:`Removing``Running``Running`

- 迁移完成:`Running``Running`

以扩容阶段为例,`show regions`的结果可能为:

```Plain
IoTDB> show regions
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort|InternalAddress| Role| CreateTime|
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
| 0|SchemaRegion|Running| root.ln| 1| 0| 1| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 0|SchemaRegion|Running| root.ln| 1| 0| 2| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 0|SchemaRegion|Running| root.ln| 1| 0| 3| 0.0.0.0| 6668| 127.0.0.1| Leader|2024-04-15T18:55:17.691|
| 1| DataRegion|Running| root.ln| 1| 1| 1| 0.0.0.0| 6667| 127.0.0.1| Leader|2024-04-15T18:55:19.457|
| 1| DataRegion|Running| root.ln| 1| 1| 2| 0.0.0.0| 6668| 127.0.0.1|Follower|2024-04-15T18:55:19.457|
| 1| DataRegion| Adding| root.ln| 1| 1| 3| 0.0.0.0| 6668| 127.0.0.1|Follower|2024-04-15T18:55:19.457|
+--------+------------+-------+--------+-------------+-----------+----------+----------+-------+---------------+--------+-----------------------+
Total line number = 3
It costs 0.003s
```

## Start/Stop Repair Data 语句
用于修复由于系统 bug 导致的乱序
### START REPAIR DATA
Expand Down
Loading

0 comments on commit 5987d09

Please sign in to comment.