Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: document for taos-tools #29143

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 18 additions & 16 deletions docs/en/14-reference/02-tools/09-taosdump.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,17 @@ sidebar_label: taosdump
slug: /tdengine-reference/tools/taosdump
---

taosdump is a tool application that supports backing up data from a running TDengine cluster and restoring the backed-up data to the same or another running TDengine cluster.

taosdump can back up data using databases, supertables, or basic tables as logical data units, and can also back up data records within a specified time period from databases, supertables, and basic tables. You can specify the directory path for data backup; if not specified, taosdump defaults to backing up data to the current directory.

If the specified location already has data files, taosdump will prompt the user and exit immediately to avoid data being overwritten. This means the same path can only be used for one backup.
If you see related prompts, please operate carefully.

taosdump is a logical backup tool, it should not be used to back up any raw data, environment settings, hardware information, server configuration, or cluster topology. taosdump uses [Apache AVRO](https://avro.apache.org/) as the data file format to store backup data.
`taosdump` is a TDengine data backup/recovery tool provided for open source users, and the backed up data files adopt the standard [Apache AVRO](https://avro.apache.org/)
Format, convenient for exchanging data with the external ecosystem.
Taosdump provides multiple data backup and recovery options to meet different data needs, and all supported options can be viewed through -- help.

## Installation

There are two ways to install taosdump:
Taosdump provides two installation methods:

- Install the official taosTools package, please find taosTools on the [release history page](../../../release-history/taostools/) and download it for installation.
-Taosdump is the default installation component in the TDengine installation package, which can be used after installing TDengine. For how to install TDengine, please refer to [TDengine Installation](../../../get started/)

- Compile taos-tools separately and install, please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository for details.
-Compile and install taos tools separately, refer to [taos tools](https://github.com/taosdata/taos-tools) .

## Common Use Cases

Expand All @@ -30,6 +25,9 @@ There are two ways to install taosdump:
3. Backup certain supertables or basic tables in a specified database: use the `dbname stbname1 stbname2 tbname1 tbname2 ...` parameter, note that this input sequence starts with the database name, supports only one database, and the second and subsequent parameters are the names of the supertables or basic tables in that database, separated by spaces;
4. Backup the system log database: TDengine clusters usually include a system database named `log`, which contains data for TDengine's own operation, taosdump does not back up the log database by default. If there is a specific need to back up the log database, you can use the `-a` or `--allow-sys` command line parameter.
5. "Tolerant" mode backup: Versions after taosdump 1.4.1 provide the `-n` and `-L` parameters, used for backing up data without using escape characters and in "tolerant" mode, which can reduce backup data time and space occupied when table names, column names, and label names do not use escape characters. If unsure whether to use `-n` and `-L`, use the default parameters for "strict" mode backup. For an explanation of escape characters, please refer to the [official documentation](../../sql-manual/escape-characters/).
6. If a backup file already exists in the directory specified by the `-o` parameter, to prevent data from being overwritten, taosdump will report an error and exit. Please replace it with another empty directory or clear the original data before backing up.
7. Currently, taosdump does not support data breakpoint backup function. Once the data backup is interrupted, it needs to be started from scratch.
If the backup takes a long time, it is recommended to use the (-S -E options) method to specify the start/end time for segmented backup.

:::tip

Expand All @@ -42,7 +40,8 @@ There are two ways to install taosdump:

### taosdump Restore Data

Restore data files from a specified path: use the `-i` parameter along with the data file path. As mentioned earlier, the same directory should not be used to back up different data sets, nor should the same path be used to back up the same data set multiple times, otherwise, the backup data will cause overwriting or multiple backups.
- Restore data files from a specified path: use the `-i` parameter along with the data file path. As mentioned earlier, the same directory should not be used to back up different data sets, nor should the same path be used to back up the same data set multiple times, otherwise, the backup data will cause overwriting or multiple backups.
- taosdump supports data recovery to a new database name with the parameter `-W`, please refer to the command line parameter description for details.

:::tip
taosdump internally uses the TDengine stmt binding API to write restored data, currently using 16384 as a batch for writing. If there are many columns in the backup data, it may cause a "WAL size exceeds limit" error, in which case you can try adjusting the `-B` parameter to a smaller value.
Expand Down Expand Up @@ -105,17 +104,20 @@ Usage: taosdump [OPTION...] dbname [tbname ...]
the table name.(Version 2.5.3)
-T, --thread-num=THREAD_NUM Number of thread for dump in file. Default is
8.
-W, --rename=RENAME-LIST Rename database name with new name during
importing data. RENAME-LIST:
"db1=newDB1|db2=newDB2" means rename db1 to newDB1
and rename db2 to newDB2 (Version 2.5.4)
-k, --retry-count=VALUE Set the number of retry attempts for connection or
query failures
-z, --retry-sleep-ms=VALUE retry interval sleep time, unit ms
-C, --cloud=CLOUD_DSN specify a DSN to access TDengine cloud service
-R, --restful Use RESTful interface to connect TDengine
-t, --timeout=SECONDS The timeout seconds for websocket to interact.
-g, --debug Print debug info.
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
-W, --rename=RENAME-LIST Rename database name with new name during
importing data. RENAME-LIST:
"db1=newDB1|db2=newDB2" means rename db1 to newDB1
and rename db2 to newDB2 (Version 2.5.4)

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
Expand Down
103 changes: 75 additions & 28 deletions docs/en/14-reference/02-tools/10-taosbenchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,38 @@ sidebar_label: taosBenchmark
slug: /tdengine-reference/tools/taosbenchmark
---

taosBenchmark (formerly known as taosdemo) is a tool for testing the performance of the TDengine product. taosBenchmark can test the performance of TDengine's insert, query, and subscription functions. It can simulate massive data generated by a large number of devices and flexibly control the number of databases, supertables, types and number of tag columns, types and number of data columns, number of subtables, data volume per subtable, data insertion interval, number of working threads in taosBenchmark, whether and how to insert out-of-order data, etc. To accommodate the usage habits of past users, the installation package provides taosdemo as a soft link to taosBenchmark.
TaosBenchmark is a performance benchmarking tool for TDengine products, providing insertion, query, and subscription performance testing for TDengine products, and outputting performance indicators.

## Installation

There are two ways to install taosBenchmark:
taosBenchmark provides two installation methods:

- taosBenchmark is automatically installed with the official TDengine installation package, for details please refer to [TDengine Installation](../../../get-started/).
- taosBenchmark is the default installation component in the TDengine installation package, which can be used after installing TDengine. For how to install TDengine, please refer to [TDengine Installation](../../../get started/)

- Compile and install taos-tools separately, for details please refer to the [taos-tools](https://github.com/taosdata/taos-tools) repository.
- Compile and install taos tools separately, refer to [taos tools](https://github.com/taosdata/taos-tools) .

## Operation

### Configuration and Operation Methods

taosBenchmark needs to be executed in the operating system's terminal, and this tool supports two configuration methods: Command Line Arguments and JSON Configuration File. These two methods are mutually exclusive; when using a configuration file, only one command line argument `-f <json file>` can be used to specify the configuration file. When using command line arguments to run taosBenchmark and control its behavior, the `-f` parameter cannot be used; instead, other parameters must be used for configuration. In addition, taosBenchmark also offers a special mode of operation, which is running without any parameters.

taosBenchmark supports comprehensive performance testing for TDengine, and the TDengine features it supports are divided into three categories: writing, querying, and subscribing. These three functions are mutually exclusive, and each run of taosBenchmark can only select one of them. It is important to note that the type of function to be tested is not configurable when using the command line configuration method; the command line configuration method can only test writing performance. To test TDengine's query and subscription performance, you must use the configuration file method and specify the type of function to be tested through the `filetype` parameter in the configuration file.
taosBbenchmark supports three operating modes:
- No parameter mode
- Command line mode
- JSON configuration file mode
The command-line approach is a subset of the functionality of JSON configuration files, which immediately uses the command line and then the configuration file, with the parameters specified by the command line taking precedence.

**Ensure that the TDengine cluster is running correctly before running taosBenchmark.**

### Running Without Command Line Arguments

Execute the following command to quickly experience taosBenchmark performing a write performance test on TDengine based on the default configuration.

Execute the following command to quickly experience taosBenchmark performing a write performance test on TDengine based on the default configuration.
```shell
taosBenchmark
```

When running without parameters, taosBenchmark by default connects to the TDengine cluster specified under `/etc/taos`, and creates a database named `test` in TDengine, under which a supertable named `meters` is created, and 10,000 tables are created under the supertable, each table having 10,000 records inserted. Note that if a `test` database already exists, this command will delete the existing database and create a new `test` database.
When running without parameters, taosBenchmark defaults to connecting to the TDengine cluster specified in `/etc/taos/taos.cfg `.
After successful connection, a smart meter example database test, super meters, and 10000 sub meters will be created, with 10000 records per sub meter. If the test database already exists, it will be deleted before creating a new one.

### Running Using Command Line Configuration Parameters

Expand All @@ -46,9 +49,7 @@ The above command `taosBenchmark` will create a database named `test`, establish

### Running Using a Configuration File

The taosBenchmark installation package includes examples of configuration files, located in `<install_directory>/examples/taosbenchmark-json`

Use the following command line to run taosBenchmark and control its behavior through a configuration file.
Running in configuration file mode provides all functions, so parameters can be configured to run in the configuration file.

```shell
taosBenchmark -f <json file>
Expand Down Expand Up @@ -214,6 +215,61 @@ taosBenchmark -A INT,DOUBLE,NCHAR,BINARY\(16\)
- **-?/--help**:
Displays help information and exits. Cannot be used with other parameters.


## Output performance indicators

#### Write indicators

After writing is completed, a summary performance metric will be output in the last two lines in the following format:
``` bash
SUCC: Spent 8.527298 (real 8.117379) seconds to insert rows: 10000000 with 8 thread(s) into test 1172704.41 (real 1231924.74) records/second
SUCC: insert delay, min: 19.6780ms, avg: 64.9390ms, p90: 94.6900ms, p95: 105.1870ms, p99: 130.6660ms, max: 157.0830ms
```
First line write speed statistics:
- Spent: Total write time, in seconds, counting from the start of writing the first data to the end of the last data. This indicates that a total of 8.527298 seconds were spent
- Real: Total write time (calling the engine), excluding the time spent preparing data for the testing framework. Purely counting the time spent on engine calls, The time spent is 8.117379 seconds. If 8.527298-8.117379=0.409919 seconds, it is the time spent preparing data for the testing framework
- Rows: Write the total number of rows, which is 10 million pieces of data
- Threads: The number of threads being written, which is 8 threads writing simultaneously
- Records/second write speed = `total write time` / `total number of rows written`, real in parentheses is the same as before, indicating pure engine write speed

Second line single write delay statistics:
- min: Write minimum delay
- avg: Write normal delay
- p90: Write delay p90 percentile delay number
- p95: Write delay p95 percentile delay number
- p99: Write delay p99 percentile delay number
- max: maximum write delay
Through this series of indicators, the distribution of write request latency can be observed

#### Query indicators
The query performance test mainly outputs the QPS indicator of query request speed, and the output format is as follows:

``` bash
complete query with 3 threads and 10000 query delay avg: 0.002686s min: 0.001182s max: 0.012189s p90: 0.002977s p95: 0.003493s p99: 0.004645s SQL command: select ...
INFO: Total specified queries: 30000
INFO: Spend 26.9530 second completed total queries: 30000, the QPS of all threads: 1113.049
```

- The first line represents the percentile distribution of query execution and query request delay for each of the three threads executing 10000 queries. The SQL command is the test query statement
- The second line indicates that a total of 10000 * 3 = 30000 queries have been completed
- The third line indicates that the total query time is 26.9653 seconds, and the query rate per second (QPS) is 1113.049 times/second

#### Subscription metrics

The subscription performance test mainly outputs consumer consumption speed indicators, with the following output format:
``` bash
INFO: consumer id 0 has poll total msgs: 376, period rate: 37.592 msgs/s, total rows: 3760000, period rate: 375924.815 rows/s
INFO: consumer id 1 has poll total msgs: 362, period rate: 36.131 msgs/s, total rows: 3620000, period rate: 361313.504 rows/s
INFO: consumer id 2 has poll total msgs: 364, period rate: 36.378 msgs/s, total rows: 3640000, period rate: 363781.731 rows/s
INFO: consumerId: 0, consume msgs: 1000, consume rows: 10000000
INFO: consumerId: 1, consume msgs: 1000, consume rows: 10000000
INFO: consumerId: 2, consume msgs: 1000, consume rows: 10000000
INFO: Consumed total msgs: 3000, total rows: 30000000
```
- Lines 1 to 3 real-time output of the current consumption speed of each consumer, msgs/s represents the number of consumption messages, each message contains multiple rows of data, and rows/s represents the consumption speed calculated by rows
- Lines 4 to 6 show the overall statistics of each consumer after the test is completed, including the total number of messages consumed and the total number of lines
- The overall statistics of all consumers in line 7, `msgs` represents how many messages were consumed in total, `rows` represents how many rows of data were consumed in total

## Configuration File Parameters Detailed Explanation

### General Configuration Parameters
Expand Down Expand Up @@ -331,21 +387,6 @@ Parameters related to supertable creation are configured in the `super_tables` s
- **repeat_ts_max** : Numeric type, when composite primary key is enabled, specifies the maximum number of records with the same timestamp to be generated
- **sqls** : Array of strings type, specifies the array of sql to be executed after the supertable is successfully created, the table name specified in sql must be prefixed with the database name, otherwise an unspecified database error will occur

#### tsma Configuration Parameters

Specify the configuration parameters for tsma in `super_tables` under `tsmas`, with the following specific parameters:

- **name**: Specifies the name of the tsma, mandatory.

- **function**: Specifies the function of the tsma, mandatory.

- **interval**: Specifies the time interval for the tsma, mandatory.

- **sliding**: Specifies the window time shift for the tsma, mandatory.

- **custom**: Specifies custom configuration appended at the end of the tsma creation statement, optional.

- **start_when_inserted**: Specifies when to create the tsma after how many rows are inserted, optional, default is 0.

#### Tag and Data Column Configuration Parameters

Expand Down Expand Up @@ -423,6 +464,11 @@ For other common parameters, see Common Configuration Parameters.

Configuration parameters for querying specified tables (can specify supertables, subtables, or regular tables) are set in `specified_table_query`.

- **mixed_query** "yes": `Mixed Query` "no": `Normal Query`, default is "no"
`Mixed Query`: All SQL statements in `sqls` are grouped by the number of threads, with each thread executing one group. Each SQL statement in a thread needs to perform `query_times` queries.
`Normal Query `: Each SQL in `sqls` starts `threads` and exits after executing `query_times` times. The next SQL can only be executed after all previous SQL threads have finished executing and exited.
Regardless of whether it is a `Normal Query` or `Mixed Query`, the total number of query executions is the same. The total number of queries = `sqls` * `threads` * `query_times`. The difference is that `Normal Query` starts `threads` for each SQL query, while ` Mixed Query` only starts `threads` once to complete all SQL queries. The number of thread startups for the two is different.

- **query_interval** : Query interval, in seconds, default is 0.

- **threads** : Number of threads executing the SQL query, default is 1.
Expand All @@ -433,7 +479,8 @@ Configuration parameters for querying specified tables (can specify supertables,

#### Configuration Parameters for Querying Supertables

Configuration parameters for querying supertables are set in `super_table_query`.
Configuration parameters for querying supertables are set in `super_table_query`.
The thread mode of the super table query is the same as the `Normal Query` mode of the specified query statement described above, except that `sqls` is filled all sub tables.

- **stblname** : The name of the supertable to query, required.

Expand Down
Loading